: GitBench
Single-line version number conflict
Tests ability to resolve a version number conflict during rebase. Evaluates handling of semantic version clashes with rebase polarity.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'VERSION=1.0.0' > version.txt
  5. 05 git add version.txt
  6. 06 git commit -m 'Initial version'
  7. 07 git checkout -b release
  8. 08 echo 'VERSION=2.0.0' > version.txt
  9. 09 git add version.txt
  10. 10 git commit -m 'Bump major version'
  11. 11 git checkout main
  12. 12 echo 'VERSION=1.1.0' > version.txt
  13. 13 git add version.txt
  14. 14 git commit -m 'Bump minor version'
  15. 15 git rebase release
Prompt
Resolve the rebase conflict in version.txt. The base version was 'VERSION=1.0.0'. The upstream release branch set it to 'VERSION=2.0.0'. The current branch being rebased (main) set it to 'VERSION=1.1.0'. Keep the current branch value. Provide ONLY the resolved file content.
Expected
VERSION=1.1.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 129 in → 106 out (96 reasoning)
VERSION=1.1.0
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 130 in → 141 out (118 reasoning)
VERSION=1.1.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=1.1.0" }
deepseek/deepseek-v4-flash:none PASS 100% 127 in → 9 out (0 reasoning)
VERSION=1.1.0
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 128 in → 19 out (0 reasoning)
VERSION=1.1.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=1.1.0" }
mistralai/devstral-2512 PASS 100% 131 in → 9 out
VERSION=1.1.0
mistralai/devstral-2512__json_schema PASS 100% 131 in → 16 out
VERSION=1.1.0
JSON Schema Structured Output
(raw) {"resolved_content": "VERSION=1.1.0"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 144 in → 171 out (151 reasoning)
VERSION=1.1.0
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 144 in → 107 out (77 reasoning)
VERSION=1.1.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=1.1.0" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 142 in → 9 out (0 reasoning)
VERSION=1.1.0
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 143 in → 19 out (0 reasoning)
VERSION=1.1.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=1.1.0" }
poolside/laguna-xs.2:high PASS 100% 169 in → 236 out (227 reasoning)
VERSION=1.1.0
poolside/laguna-xs.2:high__json_schema PASS 100% 165 in → 209 out (194 reasoning)
VERSION=1.1.0
JSON Schema Structured Output
(raw) {"resolved_content": "VERSION=1.1.0"}
poolside/laguna-xs.2:none PASS 100% 166 in → 9 out (0 reasoning)
VERSION=1.1.0
poolside/laguna-xs.2:none__json_schema PASS 100% 169 in → 18 out (0 reasoning)
VERSION=1.1.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=1.1.0" }