: GitBench
Multi-line hunk: address block conflict
Tests ability to resolve a multi-line hunk merge conflict with address/contact blocks. Evaluates combining partial updates from both sides.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 git config user.email 'test@test.com'
  5. 05 printf 'Name: John Doe Email: john@example.com Phone: 555-1234' > contact.txt
  6. 06 git add contact.txt
  7. 07 git commit -m 'Initial contact'
  8. 08 git checkout -b work
  9. 09 printf 'Name: John Doe Email: john@company.com Phone: 555-1234' > contact.txt
  10. 10 git add contact.txt
  11. 11 git commit -m 'Use work email'
  12. 12 git checkout main
  13. 13 printf 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876' > contact.txt
  14. 14 git add contact.txt
  15. 15 git commit -m 'Update contact info'
  16. 16 git merge work
Prompt
Resolve the merge conflict in contact.txt. The base had john@example.com and 555-1234. Branch A kept john@example.com but updated the email to john@company.com. Branch B updated both to johndoe@personal.com and 555-9876. Provide ONLY the resolved file content with the most complete updates.
Expected
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 134 in → 913 out (886 reasoning)
``` Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ```
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 136 in → 1,138 out (1,104 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
deepseek/deepseek-v4-flash:none PASS 100% 134 in → 25 out (0 reasoning)
``` Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ```
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 134 in → 33 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
mistralai/devstral-2512 PASS 100% 150 in → 26 out
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
mistralai/devstral-2512__json_schema PASS 100% 150 in → 33 out
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) {"resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876"}
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 163 in → 26 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 163 in → 36 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
poolside/laguna-xs.2:high PASS 100% 188 in → 1,234 out (1,208 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
poolside/laguna-xs.2:high__json_schema PASS 100% 188 in → 1,617 out (1,581 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
poolside/laguna-xs.2:none PASS 100% 188 in → 26 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
poolside/laguna-xs.2:none__json_schema PASS 100% 188 in → 36 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
nvidia/nemotron-3-nano-30b-a3b:high FAIL 0% 163 in → 820 out (697 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Name: John Doe Email: john@company.com Phone: 555-9876'
nvidia/nemotron-3-nano-30b-a3b:high__json_schema FAIL 0% 163 in → 530 out (453 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: john@company.com\nPhone: 555-9876" }
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Name: John Doe Email: john@company.com Phone: 555-9876'