: GitBench
Release branches
Tests ability to identify merged release branches for cleanup. Evaluates release-branch lifecycle management.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1' > version.txt
  5. 05 git add version.txt
  6. 06 git commit -m 'v1.0'
  7. 07 git checkout -b release-1.0
  8. 08 echo 'v1.0' > release.txt
  9. 09 git add release.txt
  10. 10 git commit -m 'Release 1.0'
  11. 11 git checkout main
  12. 12 git merge release-1.0
  13. 13 git checkout -b release-2.0
  14. 14 echo 'v2.0' > release.txt
  15. 15 git add release.txt
  16. 16 git commit -m 'Release 2.0'
  17. 17 git checkout main
Prompt
Look at the branches in this repository. Which branches should be deleted because they are fully merged into main? List ONLY the branch names to delete, one per line. If none, respond with 'none'.
Expected
release-1.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 116 in → 182 out (175 reasoning)
release-1.0
deepseek/deepseek-v4-flash:none PASS 100% 111 in → 6 out (0 reasoning)
release-1.0
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 110 in → 16 out (0 reasoning)
release-1.0
JSON Schema Structured Output
(raw) { "branches_to_delete": ["release-1.0"] }
mistralai/devstral-2512 PASS 100% 116 in → 6 out
release-1.0
mistralai/devstral-2512__json_schema PASS 100% 118 in → 15 out
release-1.0
JSON Schema Structured Output
(raw) {"branches_to_delete": ["release-1.0"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 129 in → 125 out (102 reasoning)
release-1.0
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 128 in → 132 out (105 reasoning)
release-1.0
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "release-1.0" ] }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 126 in → 21 out (0 reasoning)
release-1.0
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "release-1.0" ] }
poolside/laguna-xs.2:high PASS 100% 164 in → 198 out (190 reasoning)
release-1.0
poolside/laguna-xs.2:high__json_schema PASS 100% 163 in → 135 out (120 reasoning)
release-1.0
JSON Schema Structured Output
(raw) {"branches_to_delete": ["release-1.0"]}
poolside/laguna-xs.2:none PASS 100% 163 in → 7 out (0 reasoning)
release-1.0
poolside/laguna-xs.2:none__json_schema FAIL 100% 163 in → 20 out (0 reasoning)
release-1.0 release-2.0
JSON Schema Structured Output
(raw) {"branches_to_delete": ["release-1.0", "release-2.0"]}
Failure: Extra: ['release-2.0']
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 131 in → 2 out (0 reasoning)
none
Failure: Missing: ['release-1.0'] Extra: ['none']