: GitBench
Single-line version number conflict
Tests ability to resolve a version number conflict during cherry-pick. Evaluates handling of semantic versioning clashes between branches.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'VERSION=1.0.0' > version.txt
  5. 05 git add version.txt
  6. 06 git commit -m 'Initial version'
  7. 07 git checkout -b release
  8. 08 echo 'VERSION=2.0.0' > version.txt
  9. 09 git add version.txt
  10. 10 git commit -m 'Bump major version'
  11. 11 git checkout main
  12. 12 echo 'VERSION=1.1.0' > version.txt
  13. 13 git add version.txt
  14. 14 git commit -m 'Bump minor version'
  15. 15 git cherry-pick $(git rev-parse release)
Prompt
Resolve the cherry-pick conflict in version.txt. The base version was 'VERSION=1.0.0'. The incoming commit (release) changed it to 'VERSION=2.0.0'. The current branch (main) has 'VERSION=1.1.0'. Provide ONLY the resolved file content.
Expected
VERSION=2.0.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 136 in → 260 out (250 reasoning)
VERSION=2.0.0
deepseek/deepseek-v4-flash:none PASS 100% 121 in → 9 out (0 reasoning)
VERSION=2.0.0
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 121 in → 19 out (0 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=2.0.0\n" }
mistralai/devstral-2512 PASS 100% 122 in → 9 out
VERSION=2.0.0
mistralai/devstral-2512__json_schema PASS 100% 122 in → 16 out
VERSION=2.0.0
JSON Schema Structured Output
(raw) {"resolved_content": "VERSION=2.0.0"}
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 135 in → 472 out (416 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=2.0.0" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 136 in → 9 out (0 reasoning)
VERSION=2.0.0
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 134 in → 19 out (0 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) { "resolved_content": "VERSION=2.0.0" }
poolside/laguna-xs.2:high PASS 100% 161 in → 305 out (295 reasoning)
VERSION=2.0.0
poolside/laguna-xs.2:high__json_schema PASS 100% 161 in → 194 out (179 reasoning)
VERSION=2.0.0
JSON Schema Structured Output
(raw) {"resolved_content": "VERSION=2.0.0"}
poolside/laguna-xs.2:none PASS 100% 160 in → 9 out (0 reasoning)
VERSION=2.0.0
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
nvidia/nemotron-3-nano-30b-a3b:high FAIL 0% 133 in → 32,768 out (81 reasoning)
(empty output)
Failure: Expected 'VERSION=2.0.0', got ''
poolside/laguna-xs.2:none__json_schema FAIL 0% 159 in → 46 out (0 reasoning)
VERSION=2.0.0 The conflict has been resolved by accepting the incoming release version 2.0.0, as it represents the intended major version bump from the release commit.
JSON Schema Structured Output
(raw) {"resolved_content":"VERSION=2.0.0\n\nThe conflict has been resolved by accepting the incoming release version 2.0.0, as it represents the intended major version bump from the release commit."}
Failure: Expected 'VERSION=2.0.0', got 'VERSION=2.0.0 The conflict has been resolved by accepting the incoming release version 2.0.0, as it represents the intended major version bump from the release commit.'