: GitBench
Single-line status message conflict
Tests ability to resolve a status-message conflict. Evaluates choosing between conflicting status values from different branches.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Status: OK' > status.txt
  5. 05 git add status.txt
  6. 06 git commit -m 'Initial status'
  7. 07 git checkout -b fix
  8. 08 echo 'Status: Resolved' > status.txt
  9. 09 git add status.txt
  10. 10 git commit -m 'Fix status'
  11. 11 git checkout main
  12. 12 echo 'Status: Failed' > status.txt
  13. 13 git add status.txt
  14. 14 git commit -m 'Update status'
  15. 15 git cherry-pick $(git rev-parse fix)
Prompt
Resolve the cherry-pick conflict in status.txt. The base was 'Status: OK'. The incoming commit changed it to 'Status: Resolved'. The current branch has 'Status: Failed'. Provide ONLY the resolved file content.
Expected
Status: Resolved
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 87 in → 385 out (379 reasoning)
Status: Resolved
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 87 in → 375 out (403 reasoning)
Status: Resolved
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Resolved" }
deepseek/deepseek-v4-flash:none PASS 100% 87 in → 4 out (0 reasoning)
Status: Resolved
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 88 in → 17 out (0 reasoning)
Status: Resolved
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Resolved" }
mistralai/devstral-2512 PASS 100% 91 in → 5 out
Status: Resolved
mistralai/devstral-2512__json_schema PASS 100% 88 in → 12 out
Status: Resolved
JSON Schema Structured Output
(raw) {"resolved_content": "Status: Resolved"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 102 in → 420 out (478 reasoning)
Status: Resolved
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 102 in → 872 out (976 reasoning)
Status: Resolved
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Resolved" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 103 in → 5 out (0 reasoning)
Status: Resolved
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 103 in → 15 out (0 reasoning)
Status: Resolved
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Resolved" }
poolside/laguna-xs.2:high PASS 100% 133 in → 242 out (236 reasoning)
Status: Resolved
poolside/laguna-xs.2:high__json_schema PASS 100% 132 in → 388 out (376 reasoning)
Status: Resolved
JSON Schema Structured Output
(raw) {"resolved_content": "Status: Resolved"}
poolside/laguna-xs.2:none FAIL 0% 131 in → 9 out (0 reasoning)
``` Status: Failed ```
Failure: Expected 'Status: Resolved', got ' ``` Status: Failed ``` '
poolside/laguna-xs.2:none__json_schema FAIL 0% 130 in → 10 out (0 reasoning)
Status: Failed
JSON Schema Structured Output
(raw) {"resolved_content": "Status: Failed"}
Failure: Expected 'Status: Resolved', got 'Status: Failed'