: GitBench
Three-way: multiple files with conflicting and non-conflicting changes
Tests ability to handle rebase with partial conflicts across files. Evaluates understanding of clean vs conflicted files in rebase.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Title: My Project' > project.txt
  5. 05 echo 'File A content' > file_a.txt
  6. 06 echo 'File B content' > file_b.txt
  7. 07 git add project.txt file_a.txt file_b.txt
  8. 08 git commit -m 'Initial project'
  9. 09 git checkout -b branch-a
  10. 10 echo 'Title: My Project v2' > project.txt
  11. 11 echo 'File A updated' > file_a.txt
  12. 12 git add project.txt file_a.txt
  13. 13 git commit -m 'Update title and file A'
  14. 14 git checkout main
  15. 15 echo 'Title: My Project v3' > project.txt
  16. 16 echo 'File B updated' > file_b.txt
  17. 17 git add project.txt file_b.txt
  18. 18 git commit -m 'Update title and file B'
  19. 19 git rebase branch-a
Prompt
Resolve the rebase conflict in project.txt. Branch being rebased set Title: My Project v2. Main branch set Title: My Project v3. Both branches also updated different files (file_a and file_b) without conflicts. For the project.txt conflict, the higher version is more appropriate. Provide ONLY the resolved project.txt content.
Expected
Title: My Project v3
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 118 in → 116 out (108 reasoning)
Title: My Project v3
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 118 in → 244 out (225 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": " Title: My Project v3\n" }
deepseek/deepseek-v4-flash:none PASS 100% 135 in → 6 out (0 reasoning)
Title: My Project v3
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 121 in → 16 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3" }
mistralai/devstral-2512__json_schema PASS 100% 121 in → 14 out
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 132 in → 194 out (206 reasoning)
Title: My Project v3
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 134 in → 118 out (126 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3" }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 134 in → 17 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3" }
poolside/laguna-xs.2:high PASS 100% 162 in → 197 out (188 reasoning)
Title: My Project v3
poolside/laguna-xs.2:high__json_schema PASS 100% 164 in → 134 out (120 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
poolside/laguna-xs.2:none__json_schema PASS 100% 165 in → 13 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
mistralai/devstral-2512 FAIL 0% 121 in → 32 out
The resolved `project.txt` content should keep the higher version number (v3) from the main branch: ``` Title: My Project v3 ```
Failure: Expected 'Title: My Project v3', got 'The resolved `project.txt` content should keep the higher version number (v3) from the main branch: ``` Title: My Project v3 ```'
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 132 in → 89 out (0 reasoning)
<<<<<<< HEAD Title: My Project v3 >>>>>>> fa53c71 (Update title and file B) Title: My Project v3 ======= Title: My Project v2 >>>>>>> fa53c71 (Update title and file B) ======= Title: My Project v3 >>>>>>> fa53c71 (Update title and file B)
Failure: Expected 'Title: My Project v3', got '<<<<<<< HEAD Title: My Project v3 >>>>>>> fa53c71 (Update title and file B) Title: My Project v3 ======= Title: My Project v2 >>>>>>> fa53c71 (Update title and file B) ======= Title: My Project v3 >>>>>>> fa53c71 (Update title and file B)'
poolside/laguna-xs.2:none FAIL 0% 164 in → 37 out (0 reasoning)
I'll resolve this rebase conflict by choosing the higher version (v3) as specified, since it's more appropriate. ``` Title: My Project v3 ```
Failure: Expected 'Title: My Project v3', got ' I'll resolve this rebase conflict by choosing the higher version (v3) as specified, since it's more appropriate. ``` Title: My Project v3 ``` '