: GitBench
Single-line version number conflict
Tests ability to resolve a version number merge conflict. Evaluates semantic versioning awareness in conflict resolution.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'VERSION=1.0.0' > version.txt
  5. 05 git add version.txt
  6. 06 git commit -m 'Initial version'
  7. 07 git checkout -b release
  8. 08 echo 'VERSION=2.0.0' > version.txt
  9. 09 git add version.txt
  10. 10 git commit -m 'Bump major version'
  11. 11 git checkout main
  12. 12 echo 'VERSION=1.1.0' > version.txt
  13. 13 git add version.txt
  14. 14 git commit -m 'Bump minor version'
  15. 15 git merge release
Prompt
Resolve the merge conflict in version.txt. The base version was 'VERSION=1.0.0'. Branch A (release) changed it to 'VERSION=2.0.0'. Branch B (main) changed it to 'VERSION=1.1.0'. Provide ONLY the resolved file content.
Expected
VERSION=2.0.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 109 in → 634 out (604 reasoning)
VERSION=2.0.0
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 111 in → 543 out (524 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=2.0.0" }
deepseek/deepseek-v4-flash:none PASS 100% 109 in → 8 out (0 reasoning)
VERSION=2.0.0
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 111 in → 19 out (0 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=2.0.0\n" }
mistralai/devstral-2512 PASS 100% 109 in → 9 out
VERSION=2.0.0
mistralai/devstral-2512__json_schema PASS 100% 109 in → 16 out
VERSION=2.0.0
JSON Schema Structured Output
(raw) {"resolved_content": "VERSION=2.0.0"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 122 in → 543 out (517 reasoning)
VERSION=2.0.0
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 122 in → 423 out (408 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=2.0.0" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 122 in → 9 out (0 reasoning)
VERSION=2.0.0
poolside/laguna-xs.2:high PASS 100% 148 in → 574 out (564 reasoning)
VERSION=2.0.0
poolside/laguna-xs.2:high__json_schema PASS 100% 148 in → 474 out (459 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) {"resolved_content": "VERSION=2.0.0"}
poolside/laguna-xs.2:none PASS 100% 148 in → 9 out (0 reasoning)
VERSION=2.0.0
poolside/laguna-xs.2:none__json_schema PASS 100% 148 in → 14 out (0 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) {"resolved_content": "VERSION=2.0.0"}
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 0% 122 in → 19 out (0 reasoning)
VERSION=1.1.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=1.1.0" }
Failure: Expected 'VERSION=2.0.0', got 'VERSION=1.1.0'