: GitBench
Three-way: multiple files with conflicting and non-conflicting changes
Tests ability to handle merge where some files conflict and others merge cleanly. Evaluates partial-conflict understanding.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Title: My Project' > project.txt
  5. 05 echo 'File A content' > file_a.txt
  6. 06 echo 'File B content' > file_b.txt
  7. 07 git add project.txt file_a.txt file_b.txt
  8. 08 git commit -m 'Initial project'
  9. 09 git checkout -b branch-a
  10. 10 echo 'Title: My Project v2' > project.txt
  11. 11 echo 'File A updated' > file_a.txt
  12. 12 git add project.txt file_a.txt
  13. 13 git commit -m 'Update title and file A'
  14. 14 git checkout main
  15. 15 echo 'Title: My Project v3' > project.txt
  16. 16 echo 'File B updated' > file_b.txt
  17. 17 git add project.txt file_b.txt
  18. 18 git commit -m 'Update title and file B'
  19. 19 git merge branch-a
Prompt
Resolve the merge conflict in project.txt. Branch A set Title: My Project v2. Branch B set Title: My Project v3. Both branches also updated different files (file_a and file_b) without conflicts. For the project.txt conflict, the higher version is more appropriate. Provide ONLY the resolved project.txt content.
Expected
Title: My Project v3
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 106 in → 77 out (69 reasoning)
Title: My Project v3
deepseek/deepseek-v4-flash:none PASS 100% 106 in → 6 out (0 reasoning)
Title: My Project v3
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 108 in → 13 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
mistralai/devstral-2512__json_schema PASS 100% 106 in → 14 out
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 119 in → 213 out (215 reasoning)
Title: My Project v3
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 119 in → 156 out (157 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 119 in → 7 out (0 reasoning)
Title: My Project v3
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 119 in → 18 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3\n" }
poolside/laguna-xs.2:high PASS 100% 150 in → 142 out (133 reasoning)
Title: My Project v3
poolside/laguna-xs.2:high__json_schema PASS 100% 150 in → 204 out (190 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
poolside/laguna-xs.2:none PASS 100% 150 in → 8 out (0 reasoning)
Title: My Project v3
poolside/laguna-xs.2:none__json_schema PASS 100% 150 in → 13 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
Invalid JSON. Output: Title: My Project v3
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
mistralai/devstral-2512 FAIL 0% 106 in → 55 out
The resolved `project.txt` content should be: ``` Title: My Project v3 ``` Since the higher version (v3) is more appropriate, we keep the change from Branch B (HEAD in this case) and discard the v2 version from Branch A.
Failure: Expected 'Title: My Project v3', got 'The resolved `project.txt` content should be: ``` Title: My Project v3 ``` Since the higher version (v3) is more appropriate, we keep the change from Branch B (HEAD in this case) and discard the v2 version from Branch A.'