: GitBench
Only main branch
Tests ability to handle a repository with only the main branch (trivial case). Evaluates the minimum-viable-repo edge case.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Hello' > hello.txt
  5. 05 git add hello.txt
  6. 06 git commit -m 'Initial commit'
Prompt
Look at the branches in this repository. Which branches should be deleted because they are fully merged into main? List ONLY the branch names to delete, one per line. If none, respond with 'none'.
Expected
none
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 68 in → 66 out (63 reasoning)
none
deepseek/deepseek-v4-flash:none PASS 100% 69 in → 2 out (0 reasoning)
none
mistralai/devstral-2512 PASS 100% 69 in → 2 out
none
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 81 in → 82 out (87 reasoning)
none
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 82 in → 72 out (69 reasoning)
none
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "none" ] }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 81 in → 2 out (0 reasoning)
none
poolside/laguna-xs.2:high PASS 100% 115 in → 337 out (333 reasoning)
none
poolside/laguna-xs.2:high__json_schema PASS 100% 116 in → 159 out (144 reasoning)
none
JSON Schema Structured Output
(raw) { "branches_to_delete": ["none"] }
poolside/laguna-xs.2:none PASS 100% 114 in → 3 out (0 reasoning)
none
poolside/laguna-xs.2:none__json_schema PASS 100% 115 in → 9 out (0 reasoning)
none
JSON Schema Structured Output
(raw) {"branches_to_delete": ["none"]}
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
deepseek/deepseek-v4-flash:none__json_schema FAIL 0% 67 in → 10 out (0 reasoning)
(empty output)
JSON Schema Structured Output
(raw) { "branches_to_delete": [] }
Failure: Missing: ['none']
(empty output)
JSON Schema Structured Output
(raw) {"branches_to_delete": []}
Failure: Missing: ['none']
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 0% 81 in → 19 out (0 reasoning)
main
JSON Schema Structured Output
(raw) { "branches_to_delete": [ "main" ] }
Failure: Missing: ['none'] Extra: ['main']