: GitBench
Multi-line address block conflict
Tests ability to resolve a multi-line address block conflict during rebase. Evaluates combining partial updates with correct polarity.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf 'Name: John Doe Email: john@example.com Phone: 555-1234' > contact.txt
  5. 05 git add contact.txt
  6. 06 git commit -m 'Initial contact'
  7. 07 git checkout -b work
  8. 08 printf 'Name: John Doe Email: john@company.com Phone: 555-1234' > contact.txt
  9. 09 git add contact.txt
  10. 10 git commit -m 'Use work email'
  11. 11 git checkout main
  12. 12 printf 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876' > contact.txt
  13. 13 git add contact.txt
  14. 14 git commit -m 'Update contact info'
  15. 15 git rebase work
Prompt
Resolve the rebase conflict in contact.txt. The base had john@example.com and 555-1234. Branch being rebased kept john@example.com but updated the email to john@company.com. Main branch updated both to johndoe@personal.com and 555-9876. Provide ONLY the resolved file content with the most complete updates.
Expected
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 145 in → 710 out (716 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
deepseek/deepseek-v4-flash:none PASS 100% 145 in → 26 out (0 reasoning)
``` Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ```
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 144 in → 33 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
mistralai/devstral-2512__json_schema PASS 100% 164 in → 33 out
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) {"resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 177 in → 261 out (211 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 176 in → 436 out (377 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 175 in → 26 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 175 in → 36 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
poolside/laguna-xs.2:high PASS 100% 202 in → 923 out (896 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
poolside/laguna-xs.2:none PASS 100% 201 in → 26 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
poolside/laguna-xs.2:none__json_schema PASS 100% 201 in → 32 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) {"resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876"}
deepseek/deepseek-v4-flash:high__json_schema FAIL 0% 212 in → 1,702 out (1,824 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: john@company.com\nPhone: 555-9876" }
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Name: John Doe Email: john@company.com Phone: 555-9876'
mistralai/devstral-2512 FAIL 0% 163 in → 117 out
Here's the resolved file content with the most complete updates: ``` Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ``` This resolution: 1. Keeps the name unchanged (not in conflict) 2. Takes the email update from the main branch (johndoe@personal.com) as it's more complete than the branch being rebased 3. Takes the phone update from the main branch (555-9876) as it's the only change to the phone number
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Here's the resolved file content with the most complete updates: ``` Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ``` This resolution: 1. Keeps the name unchanged (not in conflict) 2. Takes the email update from the main branch (johndoe@personal.com) as it's more complete than the branch being rebased 3. Takes the phone update from the main branch (555-9876) as it's the only change to the phone number'
poolside/laguna-xs.2:high__json_schema FAIL 0% 202 in → 299 out (267 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
JSON Schema Structured Output
(raw) {"resolved_content": "Name: John Doe\nEmail: john@company.com\nPhone: 555-9876"}
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Name: John Doe Email: john@company.com Phone: 555-9876'