: GitBench
Three-way: multiple files with conflicting and non-conflicting changes
Tests ability to handle cherry-pick where some files conflict and others merge cleanly. Evaluates understanding of partial-conflict scenarios.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Title: My Project' > project.txt
  5. 05 echo 'File A content' > file_a.txt
  6. 06 echo 'File B content' > file_b.txt
  7. 07 git add project.txt file_a.txt file_b.txt
  8. 08 git commit -m 'Initial project'
  9. 09 git checkout -b branch-a
  10. 10 echo 'Title: My Project v2' > project.txt
  11. 11 echo 'File A updated' > file_a.txt
  12. 12 git add project.txt file_a.txt
  13. 13 git commit -m 'Update title and file A'
  14. 14 git checkout main
  15. 15 echo 'Title: My Project v3' > project.txt
  16. 16 echo 'File B updated' > file_b.txt
  17. 17 git add project.txt file_b.txt
  18. 18 git commit -m 'Update title and file B'
  19. 19 git cherry-pick $(git rev-parse branch-a)
Prompt
Resolve the cherry-pick conflict in project.txt. The incoming commit set Title: My Project v2. The current branch (main) has Title: My Project v3. Both branches also updated different files (file_a and file_b) without conflicts. For the project.txt conflict, the higher version is more appropriate. Provide ONLY the resolved project.txt content.
Expected
Title: My Project v3
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 123 in → 167 out (160 reasoning)
Title: My Project v3
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 122 in → 97 out (87 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3" }
deepseek/deepseek-v4-flash:none PASS 100% 122 in → 7 out (0 reasoning)
Title: My Project v3
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 122 in → 18 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3" }
mistralai/devstral-2512 PASS 100% 124 in → 7 out
Title: My Project v3
mistralai/devstral-2512__json_schema PASS 100% 123 in → 14 out
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 137 in → 78 out (83 reasoning)
Title: My Project v3
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 137 in → 211 out (242 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 136 in → 7 out (0 reasoning)
Title: My Project v3
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 137 in → 16 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content":"Title: My Project v3" }
poolside/laguna-xs.2:high PASS 100% 168 in → 192 out (184 reasoning)
Title: My Project v3
poolside/laguna-xs.2:high__json_schema PASS 100% 169 in → 143 out (128 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) {"resolved_content": "Title: My Project v3"}
poolside/laguna-xs.2:none__json_schema PASS 100% 167 in → 15 out (0 reasoning)
Title: My Project v3
JSON Schema Structured Output
(raw) { "resolved_content": "Title: My Project v3"}
poolside/laguna-xs.2:none FAIL 0% 168 in → 63 out (0 reasoning)
To resolve this cherry-pick conflict, I need to choose the higher version as specified in the requirements. Since v3 is higher than v2, I should select the content from the current branch (HEAD). Here is the resolved project.txt content: ``` Title: My Project v3 ```
Failure: Expected 'Title: My Project v3', got ' To resolve this cherry-pick conflict, I need to choose the higher version as specified in the requirements. Since v3 is higher than v2, I should select the content from the current branch (HEAD). Here is the resolved project.txt content: ``` Title: My Project v3 ``` '