: GitBench
List existing worktrees
Tests ability to list all existing worktrees. Evaluates worktree-enumeration command knowledge.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Hello' > hello.txt
  5. 05 git add hello.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 git checkout -b feature
  8. 08 echo 'Feature' > feature.txt
  9. 09 git add feature.txt
  10. 10 git commit -m 'Add feature'
  11. 11 git checkout main
  12. 12 git worktree add ../feature-wt feature
Prompt
Run the command to list all git worktrees. Output ONLY the git command, nothing else.
Expected
git worktree list
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 193 in → 77 out (71 reasoning)
git worktree list
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 248 in → 84 out (79 reasoning)
git worktree list
JSON Schema Structured Output
(raw) {"command": "git worktree list"}
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 194 in → 13 out (0 reasoning)
git worktree list
JSON Schema Structured Output
(raw) { "command": "git worktree list" }
mistralai/devstral-2512__json_schema PASS 100% 231 in → 10 out
git worktree list
JSON Schema Structured Output
(raw) {"command": "git worktree list"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 243 in → 125 out (127 reasoning)
git worktree list
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 250 in → 80 out (71 reasoning)
git worktree list
JSON Schema Structured Output
(raw) { "command": "git worktree list" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 245 in → 5 out (0 reasoning)
git worktree list
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 239 in → 13 out (0 reasoning)
git worktree list
JSON Schema Structured Output
(raw) { "command": "git worktree list" }
poolside/laguna-xs.2:high PASS 100% 261 in → 90 out (83 reasoning)
git worktree list
poolside/laguna-xs.2:high__json_schema PASS 100% 261 in → 90 out (79 reasoning)
git worktree list
JSON Schema Structured Output
(raw) {"command": "git worktree list"}
poolside/laguna-xs.2:none PASS 100% 269 in → 6 out (0 reasoning)
git worktree list
poolside/laguna-xs.2:none__json_schema PASS 100% 264 in → 10 out (0 reasoning)
git worktree list
JSON Schema Structured Output
(raw) {"command": "git worktree list"}
(empty output)
Failure: Model 'deepseek/deepseek-v4-flash:none' violated the reasoning_level=none invariant: provider reported 75 reasoning token(s)
mistralai/devstral-2512 FAIL 0% 234 in → 10 out
```bash git worktree list ```
Failure: Command did not match accepted alternatives: ['git worktree list', 'git worktree list --porcelain']