: GitBench
Branches off branches
Tests ability to handle nested branch topology (branches off branches). Evaluates understanding of merge status when branches fork from non-main branches.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1' > app.py
  5. 05 git add app.py
  6. 06 git commit -m 'Initial'
  7. 07 git checkout -b feature-base
  8. 08 echo 'base feature' > base.py
  9. 09 git add base.py
  10. 10 git commit -m 'Base feature'
  11. 11 git checkout -b feature-sub
  12. 12 echo 'sub feature' > sub.py
  13. 13 git add sub.py
  14. 14 git commit -m 'Sub feature'
  15. 15 git checkout feature-base
  16. 16 git merge feature-sub
  17. 17 git checkout main
  18. 18 git merge feature-base
Prompt
Look at the branches in this repository. Which branches should be deleted because they are fully merged into main? List ONLY the branch names to delete, one per line. If none, respond with 'none'.
Expected
feature-base feature-sub
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 101 in → 246 out (260 reasoning)
feature-base feature-sub
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 98 in → 179 out (183 reasoning)
feature-base feature-sub
JSON Schema Structured Output
(raw) { "branches_to_delete": ["feature-base", "feature-sub"] }
deepseek/deepseek-v4-flash:none PASS 100% 97 in → 5 out (0 reasoning)
feature-base feature-sub
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 94 in → 23 out (0 reasoning)
feature-base feature-sub
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "feature-base", "feature-sub" ] }
mistralai/devstral-2512 PASS 100% 107 in → 6 out
feature-base feature-sub
mistralai/devstral-2512__json_schema PASS 100% 104 in → 16 out
feature-base feature-sub
JSON Schema Structured Output
(raw) {"branches_to_delete": ["feature-base", "feature-sub"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 108 in → 189 out (212 reasoning)
feature-base feature-sub
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 114 in → 93 out (91 reasoning)
feature-base feature-sub
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "feature-base", "feature-sub" ] }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 117 in → 6 out (0 reasoning)
feature-base feature-sub
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 111 in → 23 out (0 reasoning)
feature-base feature-sub
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "feature-base", "feature-sub" ] }
poolside/laguna-xs.2:high PASS 100% 157 in → 128 out (120 reasoning)
feature-base feature-sub
poolside/laguna-xs.2:high__json_schema PASS 100% 154 in → 168 out (153 reasoning)
feature-base feature-sub
JSON Schema Structured Output
(raw) {"branches_to_delete": ["feature-base", "feature-sub"]}
poolside/laguna-xs.2:none PASS 100% 154 in → 7 out (0 reasoning)
feature-base feature-sub
poolside/laguna-xs.2:none__json_schema PASS 100% 154 in → 14 out (0 reasoning)
feature-base feature-sub
JSON Schema Structured Output
(raw) {"branches_to_delete": ["feature-base", "feature-sub"]}