: GitBench
Feature branches merged and unmerged
Tests ability to distinguish merged from unmerged feature branches. Evaluates selective branch cleanup with mixed merge status.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'App' > app.py
  5. 05 git add app.py
  6. 06 git commit -m 'Initial app'
  7. 07 git checkout -b feature-auth
  8. 08 echo 'auth' > auth.py
  9. 09 git add auth.py
  10. 10 git commit -m 'Add auth'
  11. 11 git checkout main
  12. 12 git merge feature-auth
  13. 13 git checkout -b feature-db
  14. 14 echo 'db' > db.py
  15. 15 git add db.py
  16. 16 git commit -m 'Add db'
  17. 17 git checkout main
  18. 18 git merge feature-db
  19. 19 git checkout -b feature-api
  20. 20 echo 'api' > api.py
  21. 21 git add api.py
  22. 22 git commit -m 'Add api'
  23. 23 git checkout main
  24. 24 git checkout -b feature-ui
  25. 25 echo 'ui' > ui.py
  26. 26 git add ui.py
  27. 27 git commit -m 'Add ui'
  28. 28 git checkout main
Prompt
Look at the branches in this repository. Which branches should be deleted because they are fully merged into main? List ONLY the branch names to delete, one per line. If none, respond with 'none'.
Expected
feature-auth feature-db
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 125 in → 156 out (149 reasoning)
feature-auth feature-db
deepseek/deepseek-v4-flash:none PASS 100% 123 in → 7 out (0 reasoning)
feature-auth feature-db
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 201 in → 17 out (0 reasoning)
feature-auth feature-db
JSON Schema Structured Output
(raw) {"branches_to_delete": ["feature-auth", "feature-db"]}
mistralai/devstral-2512 PASS 100% 127 in → 6 out
feature-auth feature-db
mistralai/devstral-2512__json_schema PASS 100% 131 in → 16 out
feature-auth feature-db
JSON Schema Structured Output
(raw) {"branches_to_delete": ["feature-auth", "feature-db"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 145 in → 162 out (178 reasoning)
feature-auth feature-db
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 142 in → 297 out (285 reasoning)
feature-auth feature-db
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "feature-auth", "feature-db" ] }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 100% 144 in → 18 out (0 reasoning)
feature-auth feature-db feature-ui
JSON Schema Structured Output
(raw) {"branches_to_delete": ["feature-auth","feature-db","feature-ui"]}
Failure: Extra: ['feature-ui']
poolside/laguna-xs.2:high PASS 100% 179 in → 199 out (191 reasoning)
feature-auth feature-db
poolside/laguna-xs.2:high__json_schema PASS 100% 183 in → 268 out (253 reasoning)
feature-auth feature-db
JSON Schema Structured Output
(raw) {"branches_to_delete": ["feature-auth", "feature-db"]}
poolside/laguna-xs.2:none PASS 100% 180 in → 7 out (0 reasoning)
feature-auth feature-db
poolside/laguna-xs.2:none__json_schema PASS 100% 182 in → 27 out (0 reasoning)
feature-auth feature-db
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "feature-auth", "feature-db" ] }
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 147 in → 2 out (0 reasoning)
none
Failure: Missing: ['feature-auth', 'feature-db'] Extra: ['none']