: GitBench
Combined cleanup of untracked and ignored with directories in specific path
Tests ability to perform combined cleanup of untracked and ignored files with directories in a specific path. Evaluates advanced multi-flag cleanup.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf '*.log tmp/ ' > .gitignore
  5. 05 mkdir -p src
  6. 06 echo 'main' > src/app.py
  7. 07 git add .gitignore src
  8. 08 git commit -m 'Initial commit'
  9. 09 echo 'debug' > src/debug.log
  10. 10 mkdir -p src/tmp
  11. 11 echo 'cache' > src/tmp/data
  12. 12 echo 'untracked' > src/new.py
  13. 13 echo 'root junk' > root.log
Prompt
Remove all untracked and ignored files and directories from the src/ directory only. Output ONLY the git command, nothing else.
Expected
git clean -f -x -d src/
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 63 in → 943 out (934 reasoning)
git clean -fdx src/
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 63 in → 210 out (195 reasoning)
git clean -fdx src/
JSON Schema Structured Output
(raw) { "command": "git clean -fdx src/" }
deepseek/deepseek-v4-flash:none PASS 100% 63 in → 9 out (0 reasoning)
git -C src clean -fdx
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 75 in → 313 out (316 reasoning)
git clean -fdx -- src/
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 75 in → 864 out (935 reasoning)
git clean -xfd src/
JSON Schema Structured Output
(raw) { "command": "git clean -xfd src/" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 75 in → 8 out (0 reasoning)
git clean -fdx src/
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 75 in → 13 out (0 reasoning)
git clean -fdx src/
JSON Schema Structured Output
(raw) {"command": "git clean -fdx src/"}
poolside/laguna-xs.2:high__json_schema PASS 100% 109 in → 288 out (270 reasoning)
git clean -fdx src/
JSON Schema Structured Output
(raw) { "command": "git clean -fdx src/" }
deepseek/deepseek-v4-flash:none__json_schema FAIL 83.3% 65 in → 14 out (0 reasoning)
git clean -fX src/
JSON Schema Structured Output
(raw) { "command": "git clean -fX src/" }
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'src/new.py'}]
mistralai/devstral-2512__json_schema FAIL 83.3% 62 in → 13 out
git clean -fdX src/
JSON Schema Structured Output
(raw) {"command": "git clean -fdX src/"}
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'src/new.py'}]
poolside/laguna-xs.2:high FAIL 66.7% 109 in → 158 out (149 reasoning)
git clean -fd src/
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'src/debug.log'}, {'type': 'dir_not_exists', 'path': 'src/tmp'}]
poolside/laguna-xs.2:none FAIL 66.7% 109 in → 8 out (0 reasoning)
git clean -fd src/
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'src/debug.log'}, {'type': 'dir_not_exists', 'path': 'src/tmp'}]
poolside/laguna-xs.2:none__json_schema FAIL 66.7% 109 in → 11 out (0 reasoning)
git clean -fd src
JSON Schema Structured Output
(raw) {"command": "git clean -fd src"}
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'src/debug.log'}, {'type': 'dir_not_exists', 'path': 'src/tmp'}]
mistralai/devstral-2512 FAIL 50% 62 in → 11 out
```bash git clean -fd src/ ```
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'src/debug.log'}, {'type': 'dir_not_exists', 'path': 'src/tmp'}, {'type': 'file_not_exists', 'path': 'src/new.py'}]