: GitBench
Multi-line hunk: address block conflict
Tests ability to resolve a multi-line hunk conflict with address/contact information. Evaluates combining partial changes from both sides.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf 'Name: John Doe Email: john@example.com Phone: 555-1234' > contact.txt
  5. 05 git add contact.txt
  6. 06 git commit -m 'Initial contact'
  7. 07 git checkout -b work
  8. 08 printf 'Name: John Doe Email: john@company.com Phone: 555-1234' > contact.txt
  9. 09 git add contact.txt
  10. 10 git commit -m 'Use work email'
  11. 11 git checkout main
  12. 12 printf 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876' > contact.txt
  13. 13 git add contact.txt
  14. 14 git commit -m 'Update contact info'
  15. 15 git cherry-pick $(git rev-parse work)
Prompt
Resolve the cherry-pick conflict in contact.txt. The base had john@example.com and 555-1234. The incoming commit kept john@example.com but updated the email to john@company.com. The current branch updated both to johndoe@personal.com and 555-9876. Provide ONLY the resolved file content with the most complete updates.
Expected
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 163 in → 451 out (427 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
deepseek/deepseek-v4-flash:none PASS 100% 147 in → 23 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 213 in → 32 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
mistralai/devstral-2512__json_schema PASS 100% 165 in → 33 out
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) {"resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876"}
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 177 in → 36 out (0 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876" }
poolside/laguna-xs.2:high__json_schema PASS 100% 203 in → 1,043 out (1,010 reasoning)
Name: John Doe Email: johndoe@personal.com Phone: 555-9876
JSON Schema Structured Output
(raw) {"resolved_content": "Name: John Doe\nEmail: johndoe@personal.com\nPhone: 555-9876\n"}
deepseek/deepseek-v4-flash:high__json_schema FAIL 0% 147 in → 1,003 out (973 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: john@company.com\nPhone: 555-9876" }
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Name: John Doe Email: john@company.com Phone: 555-9876'
mistralai/devstral-2512 FAIL 0% 165 in → 54 out
Here's the resolved file content with the most complete updates (keeping both the new email and phone number from the current branch): ``` Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ```
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Here's the resolved file content with the most complete updates (keeping both the new email and phone number from the current branch): ``` Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ```'
nvidia/nemotron-3-nano-30b-a3b:high FAIL 0% 177 in → 563 out (533 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Name: John Doe Email: john@company.com Phone: 555-9876'
nvidia/nemotron-3-nano-30b-a3b:high__json_schema FAIL 0% 177 in → 501 out (450 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
JSON Schema Structured Output
(raw) { "resolved_content": "Name: John Doe\nEmail: john@company.com\nPhone: 555-9876" }
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Name: John Doe Email: john@company.com Phone: 555-9876'
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 178 in → 107 out (0 reasoning)
--- contact.txt Name: John Doe Email: johndoe@personal.com Phone: 555-9876 AUTOMERGE_BASE: contact.txt <<<<<<< HEAD Email: johndoe@personal.com Phone: 555-9876 ======= Email: john@company.com Phone: 555-1234 >>>>>>> 5b85e99 (Use work email)
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got '--- contact.txt Name: John Doe Email: johndoe@personal.com Phone: 555-9876 AUTOMERGE_BASE: contact.txt <<<<<<< HEAD Email: johndoe@personal.com Phone: 555-9876 ======= Email: john@company.com Phone: 555-1234 >>>>>>> 5b85e99 (Use work email)'
poolside/laguna-xs.2:high FAIL 0% 202 in → 413 out (387 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got ' Name: John Doe Email: john@company.com Phone: 555-9876 '
poolside/laguna-xs.2:none FAIL 0% 203 in → 25 out (0 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got ' Name: John Doe Email: john@company.com Phone: 555-9876 '
poolside/laguna-xs.2:none__json_schema FAIL 0% 201 in → 30 out (0 reasoning)
Name: John Doe Email: john@company.com Phone: 555-9876
JSON Schema Structured Output
(raw) {"resolved_content": "Name: John Doe\nEmail: john@company.com\nPhone: 555-9876"}
Failure: Expected 'Name: John Doe Email: johndoe@personal.com Phone: 555-9876 ', got 'Name: John Doe Email: john@company.com Phone: 555-9876'