: GitBench
Remove ignored build artifacts with nested directories
Tests ability to remove ignored build artifacts in nested directories. Evaluates complex cleanup with directory hierarchies.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf '*.o *.pyc __pycache__/ build/ ' > .gitignore
  5. 05 echo 'src' > main.py
  6. 06 git add .gitignore main.py
  7. 07 git commit -m 'Initial commit'
  8. 08 echo 'obj' > module.o
  9. 09 mkdir -p __pycache__
  10. 10 echo 'cache' > __pycache__/main.pyc
  11. 11 mkdir -p build/dist
  12. 12 echo 'binary' > build/dist/app
  13. 13 echo 'untracked' > notes.md
Prompt
Remove all ignored files and directories while preserving untracked files. Output ONLY the git command, nothing else.
Expected
git clean -f -X -d
Loading campaign evidence…
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 124 in → 211 out (195 reasoning)
git clean -fdX
JSON Schema Structured Output
(raw) {"command": "git clean -fdX"}
deepseek/deepseek-v4-flash:none PASS 100% 60 in → 5 out (0 reasoning)
git clean -fdX
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 62 in → 12 out (0 reasoning)
git clean -fdX
JSON Schema Structured Output
(raw) { "command": "git clean -fdX" }
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 72 in → 493 out (518 reasoning)
git clean -Xfd
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 72 in → 281 out (306 reasoning)
git clean -fdX
JSON Schema Structured Output
(raw) { "command": "git clean -fdX" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 72 in → 6 out (0 reasoning)
git clean -fdX
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 72 in → 14 out (0 reasoning)
git clean -fdX
JSON Schema Structured Output
(raw) { "command": "git clean -fdX" }
deepseek/deepseek-v4-flash:high FAIL 60% 60 in → 187 out (178 reasoning)
git clean -fX
Failure: Failed assertions: [{'type': 'dir_not_exists', 'path': '__pycache__'}, {'type': 'dir_not_exists', 'path': 'build'}]
poolside/laguna-xs.2:high FAIL 60% 107 in → 308 out (299 reasoning)
git clean -X -f
Failure: Failed assertions: [{'type': 'dir_not_exists', 'path': '__pycache__'}, {'type': 'dir_not_exists', 'path': 'build'}]
mistralai/devstral-2512 FAIL 40% 59 in → 11 out
```bash git clean -fdX ```
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'module.o'}, {'type': 'dir_not_exists', 'path': '__pycache__'}, {'type': 'dir_not_exists', 'path': 'build'}]
mistralai/devstral-2512__json_schema FAIL 40% 59 in → 12 out
git clean -X -d
JSON Schema Structured Output
(raw) {"command": "git clean -X -d"}
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'module.o'}, {'type': 'dir_not_exists', 'path': '__pycache__'}, {'type': 'dir_not_exists', 'path': 'build'}]
poolside/laguna-xs.2:high__json_schema FAIL 40% 107 in → 3,851 out (3,838 reasoning)
git clean -d -x
JSON Schema Structured Output
(raw) {"command": "git clean -d -x"}
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'module.o'}, {'type': 'dir_not_exists', 'path': '__pycache__'}, {'type': 'dir_not_exists', 'path': 'build'}]
poolside/laguna-xs.2:none FAIL 40% 107 in → 6 out (0 reasoning)
git clean -i
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'module.o'}, {'type': 'dir_not_exists', 'path': '__pycache__'}, {'type': 'dir_not_exists', 'path': 'build'}]
poolside/laguna-xs.2:none__json_schema FAIL 40% 107 in → 17 out (0 reasoning)
git clean -d -i --exclude=notes.md
JSON Schema Structured Output
(raw) {"command": "git clean -d -i --exclude=notes.md"}
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'module.o'}, {'type': 'dir_not_exists', 'path': '__pycache__'}, {'type': 'dir_not_exists', 'path': 'build'}]