: GitBench
Single-line status message conflict
Tests ability to resolve a status-message conflict during rebase. Evaluates choosing between contradictory values in rebase context.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Status: OK' > status.txt
  5. 05 git add status.txt
  6. 06 git commit -m 'Initial status'
  7. 07 git checkout -b fix
  8. 08 echo 'Status: Resolved' > status.txt
  9. 09 git add status.txt
  10. 10 git commit -m 'Fix status'
  11. 11 git checkout main
  12. 12 echo 'Status: Failed' > status.txt
  13. 13 git add status.txt
  14. 14 git commit -m 'Update status'
  15. 15 git rebase fix
Prompt
Resolve the rebase conflict in status.txt. The base was 'Status: OK'. The upstream fix branch changed it to 'Status: Resolved'. The current branch being rebased (main) changed it to 'Status: Failed'. Keep the current branch value. Provide ONLY the resolved file content.
Expected
Status: Failed
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 120 in → 120 out (115 reasoning)
Status: Failed
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 103 in → 97 out (82 reasoning)
Status: Failed
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Failed" }
deepseek/deepseek-v4-flash:none PASS 100% 104 in → 3 out (0 reasoning)
Status: Failed
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 103 in → 12 out (0 reasoning)
Status: Failed
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Failed" }
mistralai/devstral-2512 PASS 100% 104 in → 4 out
Status: Failed
mistralai/devstral-2512__json_schema PASS 100% 103 in → 11 out
Status: Failed
JSON Schema Structured Output
(raw) {"resolved_content": "Status: Failed"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 117 in → 76 out (84 reasoning)
Status: Failed
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 117 in → 180 out (193 reasoning)
Status: Failed
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Failed" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 117 in → 4 out (0 reasoning)
Status: Failed
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 118 in → 13 out (0 reasoning)
Status: Failed
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Failed\n"}
poolside/laguna-xs.2:high PASS 100% 146 in → 215 out (210 reasoning)
Status: Failed
poolside/laguna-xs.2:high__json_schema PASS 100% 146 in → 180 out (169 reasoning)
Status: Failed
JSON Schema Structured Output
(raw) {"resolved_content": "Status: Failed"}
poolside/laguna-xs.2:none PASS 100% 145 in → 5 out (0 reasoning)
Status: Failed
poolside/laguna-xs.2:none__json_schema PASS 100% 147 in → 15 out (0 reasoning)
Status: Failed
JSON Schema Structured Output
(raw) { "resolved_content": "Status: Failed" }