: GitBench
Hotfix branch merged
Tests ability to identify a merged hotfix branch. Evaluates recognition of hotfix naming conventions in branch cleanup.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1.0' > version.txt
  5. 05 git add version.txt
  6. 06 git commit -m 'Release v1.0'
  7. 07 git checkout -b hotfix-security
  8. 08 echo 'v1.0.1-patched' > version.txt
  9. 09 git add version.txt
  10. 10 git commit -m 'Security patch'
  11. 11 git checkout main
  12. 12 git merge hotfix-security
  13. 13 git checkout -b release-v2
  14. 14 echo 'v2.0-dev' > version.txt
  15. 15 git add version.txt
  16. 16 git commit -m 'Start v2.0'
  17. 17 git checkout main
Prompt
Look at the branches in this repository. Which branches should be deleted because they are fully merged into main? List ONLY the branch names to delete, one per line. If none, respond with 'none'.
Expected
hotfix-security
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 101 in → 164 out (159 reasoning)
hotfix-security
deepseek/deepseek-v4-flash:none PASS 100% 118 in → 3 out (0 reasoning)
hotfix-security
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 102 in → 19 out (0 reasoning)
hotfix-security
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "hotfix-security" ] }
mistralai/devstral-2512 PASS 100% 106 in → 4 out
hotfix-security
mistralai/devstral-2512__json_schema PASS 100% 108 in → 13 out
hotfix-security
JSON Schema Structured Output
(raw) {"branches_to_delete": ["hotfix-security"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 119 in → 299 out (319 reasoning)
hotfix-security
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 118 in → 190 out (185 reasoning)
hotfix-security
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "hotfix-security" ] }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 113 in → 4 out (0 reasoning)
hotfix-security
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 119 in → 19 out (0 reasoning)
hotfix-security
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "hotfix-security" ] }
poolside/laguna-xs.2:high PASS 100% 147 in → 132 out (126 reasoning)
hotfix-security
poolside/laguna-xs.2:high__json_schema PASS 100% 146 in → 186 out (174 reasoning)
hotfix-security
JSON Schema Structured Output
(raw) {"branches_to_delete": ["hotfix-security"]}
poolside/laguna-xs.2:none PASS 100% 154 in → 5 out (0 reasoning)
hotfix-security
poolside/laguna-xs.2:none__json_schema PASS 100% 149 in → 11 out (0 reasoning)
hotfix-security
JSON Schema Structured Output
(raw) {"branches_to_delete": ["hotfix-security"]}
deepseek/deepseek-v4-flash:high__json_schema FAIL 0% 102 in → 175 out (160 reasoning)
release-v2
JSON Schema Structured Output
(raw) { "branches_to_delete": ["release-v2"] }
Failure: Missing: ['hotfix-security'] Extra: ['release-v2']