: GitBench
Three-way: author line with three different values
Tests ability to resolve a three-way author-name conflict in rebase. Evaluates complex merge decisions with rebase polarity.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Author: John' > author.txt
  5. 05 git add author.txt
  6. 06 git commit -m 'Initial author'
  7. 07 git checkout -b fullname
  8. 08 echo 'Author: John Smith' > author.txt
  9. 09 git add author.txt
  10. 10 git commit -m 'Use full name'
  11. 11 git checkout main
  12. 12 echo 'Author: J. Doe' > author.txt
  13. 13 git add author.txt
  14. 14 git commit -m 'Use initials'
  15. 15 git rebase fullname
Prompt
Resolve the rebase conflict in author.txt. Base was 'Author: John'. The upstream fullname branch set 'Author: John Smith'. The current branch being rebased (main) set 'Author: J. Doe'. Keep the current branch value. Provide ONLY the resolved file content.
Expected
Author: J. Doe
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 103 in → 95 out (88 reasoning)
Author: J. Doe
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 101 in → 126 out (110 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
deepseek/deepseek-v4-flash:none PASS 100% 103 in → 6 out (0 reasoning)
Author: J. Doe
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 104 in → 15 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
mistralai/devstral-2512 PASS 100% 106 in → 6 out
Author: J. Doe
mistralai/devstral-2512__json_schema PASS 100% 105 in → 13 out
Author: J. Doe
JSON Schema Structured Output
(raw) {"resolved_content": "Author: J. Doe"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 116 in → 101 out (118 reasoning)
Author: J. Doe
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 118 in → 118 out (110 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 119 in → 6 out (0 reasoning)
Author: J. Doe
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 118 in → 16 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) { "resolved_content": "Author: J. Doe" }
poolside/laguna-xs.2:high PASS 100% 145 in → 416 out (408 reasoning)
Author: J. Doe
poolside/laguna-xs.2:high__json_schema PASS 100% 148 in → 509 out (498 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) {"resolved_content": "Author: J. Doe"}
poolside/laguna-xs.2:none PASS 100% 147 in → 7 out (0 reasoning)
Author: J. Doe
poolside/laguna-xs.2:none__json_schema PASS 100% 147 in → 12 out (0 reasoning)
Author: J. Doe
JSON Schema Structured Output
(raw) {"resolved_content": "Author: J. Doe"}