: GitBench
Dry-run with directory removal preview
Tests ability to preview directory removal with dry-run. Evaluates combined flag usage (-nd).

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'tracked' > tracked.txt
  5. 05 git add tracked.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 mkdir -p artifacts/logs
  8. 08 echo 'log' > artifacts/logs/run.log
  9. 09 echo 'untracked' > scratch.txt
Prompt
Preview what untracked files and directories would be removed without actually removing them. Output ONLY the git command, nothing else.
Expected
git clean -f -d -n
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 63 in → 148 out (141 reasoning)
git clean -nfd
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 65 in → 295 out (282 reasoning)
git clean -n
JSON Schema Structured Output
(raw) { "command": "git clean -n" }
deepseek/deepseek-v4-flash:none PASS 100% 63 in → 6 out (0 reasoning)
git clean -n -d
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 63 in → 13 out (0 reasoning)
git clean -n
JSON Schema Structured Output
(raw) { "command": "git clean -n" }
mistralai/devstral-2512 PASS 100% 60 in → 10 out
```sh git clean -n ```
mistralai/devstral-2512__json_schema PASS 100% 60 in → 10 out
git clean -n
JSON Schema Structured Output
(raw) {"command": "git clean -n"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 73 in → 255 out (258 reasoning)
git clean -n -d .
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 73 in → 192 out (183 reasoning)
git clean -n
JSON Schema Structured Output
(raw) { "command": "git clean -n" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 73 in → 11 out (0 reasoning)
git clean -n -q artifacts/ scratch.txt
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 73 in → 13 out (0 reasoning)
git clean -n --directory artifacts
JSON Schema Structured Output
(raw) {"command": "git clean -n --directory artifacts"}
poolside/laguna-xs.2:high PASS 100% 109 in → 211 out (203 reasoning)
git clean -fdn
poolside/laguna-xs.2:high__json_schema PASS 100% 109 in → 167 out (154 reasoning)
git clean -n -d
JSON Schema Structured Output
(raw) {"command": "git clean -n -d"}
poolside/laguna-xs.2:none PASS 100% 109 in → 6 out (0 reasoning)
git clean -dn
poolside/laguna-xs.2:none__json_schema PASS 100% 109 in → 10 out (0 reasoning)
git clean -nd
JSON Schema Structured Output
(raw) {"command": "git clean -nd"}