: GitBench
Mixed naming conventions
Tests ability to handle mixed branch naming conventions (feature/, bugfix/, hotfix/) when identifying merged branches.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'init' > init.txt
  5. 05 git add init.txt
  6. 06 git commit -m 'Init'
  7. 07 git checkout -b bugfix-123
  8. 08 echo 'fix' > fix.txt
  9. 09 git add fix.txt
  10. 10 git commit -m 'Fix bug 123'
  11. 11 git checkout main
  12. 12 git merge bugfix-123
  13. 13 git checkout -b feature/search
  14. 14 echo 'search' > search.py
  15. 15 git add search.py
  16. 16 git commit -m 'Add search'
  17. 17 git checkout main
  18. 18 git merge feature/search
  19. 19 git checkout -b experiment-ml
  20. 20 echo 'ml' > ml.py
  21. 21 git add ml.py
  22. 22 git commit -m 'ML experiment'
  23. 23 git checkout main
Prompt
Look at the branches in this repository. Which branches should be deleted because they are fully merged into main? List ONLY the branch names to delete, one per line. If none, respond with 'none'.
Expected
bugfix-123 feature/search
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 118 in → 268 out (276 reasoning)
bugfix-123 feature/search
deepseek/deepseek-v4-flash:none PASS 100% 136 in → 7 out (0 reasoning)
bugfix-123 feature/search
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 113 in → 25 out (0 reasoning)
bugfix-123 feature/search
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "bugfix-123", "feature/search" ] }
mistralai/devstral-2512 PASS 100% 130 in → 10 out
bugfix-123 feature/search
mistralai/devstral-2512__json_schema PASS 100% 131 in → 20 out
bugfix-123 feature/search
JSON Schema Structured Output
(raw) {"branches_to_delete": ["bugfix-123", "feature/search"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 147 in → 131 out (127 reasoning)
bugfix-123 feature/search
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 145 in → 101 out (77 reasoning)
bugfix-123 feature/search
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "bugfix-123", "feature/search" ] }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 143 in → 10 out (0 reasoning)
bugfix-123 feature/search
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 145 in → 27 out (0 reasoning)
bugfix-123 feature/search
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "bugfix-123", "feature/search" ] }
poolside/laguna-xs.2:high PASS 100% 179 in → 138 out (126 reasoning)
bugfix-123 feature/search
poolside/laguna-xs.2:high__json_schema PASS 100% 181 in → 219 out (200 reasoning)
bugfix-123 feature/search
JSON Schema Structured Output
(raw) {"branches_to_delete": ["bugfix-123", "feature/search"]}
poolside/laguna-xs.2:none PASS 100% 173 in → 11 out (0 reasoning)
bugfix-123 feature/search
poolside/laguna-xs.2:none__json_schema PASS 100% 183 in → 18 out (0 reasoning)
bugfix-123 feature/search
JSON Schema Structured Output
(raw) {"branches_to_delete": ["bugfix-123", "feature/search"]}
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)