: GitBench
Single merged branch
Tests ability to identify a single merged branch that can be safely deleted. Evaluates basic git branch --merged comprehension.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Hello' > greeting.txt
  5. 05 git add greeting.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 git checkout -b fix-typo
  8. 08 echo 'Hello fixed' > greeting.txt
  9. 09 git add greeting.txt
  10. 10 git commit -m 'Fix typo'
  11. 11 git checkout main
  12. 12 git merge fix-typo
  13. 13 git checkout -b feature-login
  14. 14 echo 'Login code' > login.py
  15. 15 git add login.py
  16. 16 git commit -m 'Add login'
Prompt
Look at the branches in this repository. Which branches should be deleted because they are fully merged into main? List ONLY the branch names to delete, one per line. If none, respond with 'none'.
Expected
fix-typo
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 103 in → 248 out (242 reasoning)
fix-typo
deepseek/deepseek-v4-flash:none PASS 100% 118 in → 4 out (0 reasoning)
fix-typo
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 105 in → 19 out (0 reasoning)
fix-typo
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "fix-typo" ] }
mistralai/devstral-2512 PASS 100% 107 in → 5 out
fix-typo
mistralai/devstral-2512__json_schema PASS 100% 103 in → 14 out
fix-typo
JSON Schema Structured Output
(raw) {"branches_to_delete": ["fix-typo"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 117 in → 164 out (169 reasoning)
fix-typo
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 119 in → 354 out (331 reasoning)
fix-typo
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "fix-typo" ] }
nvidia/nemotron-3-nano-30b-a3b:none FAIL 100% 119 in → 7 out (0 reasoning)
fix-typo main
Failure: Extra: ['main']
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 100% 116 in → 24 out (0 reasoning)
fix-typo main
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "fix-typo", "main" ] }
Failure: Extra: ['main']
poolside/laguna-xs.2:high PASS 100% 145 in → 173 out (167 reasoning)
fix-typo
poolside/laguna-xs.2:high__json_schema PASS 100% 150 in → 118 out (106 reasoning)
fix-typo
JSON Schema Structured Output
(raw) {"branches_to_delete": ["fix-typo"]}
poolside/laguna-xs.2:none FAIL 100% 148 in → 8 out (0 reasoning)
feature-login fix-typo
Failure: Extra: ['feature-login']
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
poolside/laguna-xs.2:none__json_schema FAIL 0% 149 in → 10 out (0 reasoning)
feature-login
JSON Schema Structured Output
(raw) {"branches_to_delete": ["feature-login"]}
Failure: Missing: ['fix-typo'] Extra: ['feature-login']