: GitBench
Remove untracked files with interactive mode confirmation
Tests ability to use interactive mode confirmation with git clean -i. Evaluates interactive-cleanup workflow knowledge.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'tracked' > tracked.txt
  5. 05 git add tracked.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 echo 'junk1' > junk1.txt
  8. 08 echo 'junk2' > junk2.txt
  9. 09 mkdir -p builddir
  10. 10 echo 'obj' > builddir/out.o
Prompt
Remove all untracked files and directories non-interactively (force mode). Output ONLY the git command, nothing else.
Expected
git clean -f -d
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 72 in → 241 out (236 reasoning)
git clean -fd
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 136 in → 95 out (90 reasoning)
git clean -fd
JSON Schema Structured Output
(raw) {"command": "git clean -fd"}
deepseek/deepseek-v4-flash:none PASS 100% 72 in → 4 out (0 reasoning)
git clean -fd
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 72 in → 13 out (0 reasoning)
git clean -fd
JSON Schema Structured Output
(raw) { "command": "git clean -fd" }
mistralai/devstral-2512__json_schema PASS 100% 72 in → 10 out
git clean -fd
JSON Schema Structured Output
(raw) {"command": "git clean -fd"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 85 in → 519 out (540 reasoning)
git clean -fd
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 85 in → 294 out (295 reasoning)
git clean -fd
JSON Schema Structured Output
(raw) { "command": "git clean -fd" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 85 in → 6 out (0 reasoning)
git clean -fdx
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 85 in → 11 out (0 reasoning)
git clean -fd --force
JSON Schema Structured Output
(raw) {"command":"git clean -fd --force"}
poolside/laguna-xs.2:high PASS 100% 119 in → 141 out (134 reasoning)
git clean -fd
poolside/laguna-xs.2:high__json_schema PASS 100% 119 in → 183 out (172 reasoning)
git clean -fd
JSON Schema Structured Output
(raw) {"command": "git clean -fd"}
poolside/laguna-xs.2:none PASS 100% 119 in → 6 out (0 reasoning)
git clean -fd
poolside/laguna-xs.2:none__json_schema PASS 100% 119 in → 10 out (0 reasoning)
git clean -fd
JSON Schema Structured Output
(raw) {"command": "git clean -fd"}
mistralai/devstral-2512 FAIL 25% 72 in → 10 out
```bash git clean -fd ```
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'junk1.txt'}, {'type': 'file_not_exists', 'path': 'junk2.txt'}, {'type': 'dir_not_exists', 'path': 'builddir'}]