: GitBench
Three-way: author line with three different values
Tests ability to resolve a three-way author-name conflict where base, incoming, and current all differ. Evaluates complex multi-value merge decisions.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Author: John' > author.txt
  5. 05 git add author.txt
  6. 06 git commit -m 'Initial author'
  7. 07 git checkout -b fullname
  8. 08 echo 'Author: John Smith' > author.txt
  9. 09 git add author.txt
  10. 10 git commit -m 'Use full name'
  11. 11 git checkout main
  12. 12 echo 'Author: J. Doe' > author.txt
  13. 13 git add author.txt
  14. 14 git commit -m 'Use initials'
  15. 15 git cherry-pick $(git rev-parse fullname)
Prompt
Resolve the cherry-pick conflict in author.txt. The base was 'Author: John'. The incoming commit (fullname branch) changed it to 'Author: John Smith'. The current branch (main) has 'Author: J. Doe'. Keep the current branch value. Provide ONLY the resolved file content.
Expected
Author: J. Doe
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 107 in → 43 out (42 reasoning)
Author: J. Doe
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 107 in → 93 out (76 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
deepseek/deepseek-v4-flash:none PASS 100% 106 in → 6 out (0 reasoning)
Author: J. Doe
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 112 in → 14 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
mistralai/devstral-2512 PASS 100% 109 in → 6 out
Author: J. Doe
mistralai/devstral-2512__json_schema PASS 100% 109 in → 13 out
Author: J. Doe
JSON Schema Structured Output
(raw) {"resolved_content": "Author: J. Doe"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 122 in → 106 out (107 reasoning)
Author: J. Doe
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 121 in → 266 out (282 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 123 in → 16 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
poolside/laguna-xs.2:high PASS 100% 151 in → 163 out (155 reasoning)
Author: J. Doe
poolside/laguna-xs.2:high__json_schema PASS 100% 152 in → 176 out (163 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) {"resolved_content": "Author: J. Doe"}
poolside/laguna-xs.2:none PASS 100% 151 in → 7 out (0 reasoning)
Author: J. Doe
poolside/laguna-xs.2:none__json_schema PASS 100% 152 in → 17 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 122 in → 8 out (0 reasoning)
Author: J. Doe ```
Failure: Expected 'Author: J. Doe', got 'Author: J. Doe ```'