: GitBench
Three-way: author line with three different values
Tests ability to resolve a three-way author-name conflict where all three values differ. Evaluates complex merge decisions.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Author: John' > author.txt
  5. 05 git add author.txt
  6. 06 git commit -m 'Initial author'
  7. 07 git checkout -b fullname
  8. 08 echo 'Author: John Smith' > author.txt
  9. 09 git add author.txt
  10. 10 git commit -m 'Use full name'
  11. 11 git checkout main
  12. 12 git checkout -b initials
  13. 13 echo 'Author: J. Doe' > author.txt
  14. 14 git add author.txt
  15. 15 git commit -m 'Use initials'
  16. 16 git checkout main
  17. 17 git merge fullname
  18. 18 git merge initials
Prompt
Resolve the merge conflict in author.txt. Three different authors were set: base 'Author: John', branch A 'Author: John Smith', branch B 'Author: J. Doe'. Both branches were merged into main (fullname first, then initials). Keep the value from the second merge side. Provide ONLY the resolved file content.
Expected
Author: J. Doe
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 104 in → 207 out (214 reasoning)
Author: J. Doe
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 106 in → 235 out (217 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe\n" }
deepseek/deepseek-v4-flash:none PASS 100% 104 in → 6 out (0 reasoning)
Author: J. Doe
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 104 in → 17 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
mistralai/devstral-2512 PASS 100% 106 in → 6 out
Author: J. Doe
mistralai/devstral-2512__json_schema PASS 100% 106 in → 13 out
Author: J. Doe
JSON Schema Structured Output
(raw) {"resolved_content": "Author: J. Doe"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 119 in → 414 out (429 reasoning)
Author: J. Doe
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 119 in → 697 out (740 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 119 in → 17 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe\n" }
poolside/laguna-xs.2:high PASS 100% 147 in → 131 out (123 reasoning)
Author: J. Doe
poolside/laguna-xs.2:high__json_schema PASS 100% 147 in → 178 out (165 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) {"resolved_content": "Author: J. Doe"}
poolside/laguna-xs.2:none PASS 100% 147 in → 7 out (0 reasoning)
Author: J. Doe
poolside/laguna-xs.2:none__json_schema PASS 100% 147 in → 13 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) {"resolved_content": "Author: J. Doe"}
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 119 in → 5 out (0 reasoning)
Author: John Smith
Failure: Expected 'Author: J. Doe', got 'Author: John Smith'