: GitBench
Remove untracked files in a specific directory
Tests ability to remove untracked files in a specific directory path. Evaluates scoped cleanup.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 mkdir -p src test
  5. 05 echo 'main' > src/main.py
  6. 06 git add src test
  7. 07 git commit -m 'Initial commit'
  8. 08 echo 'temp' > src/temp.py
  9. 09 echo 'fixture' > test/temp.py
  10. 10 echo 'root' > root.txt
Prompt
Remove untracked files only inside the 'src' directory, leaving other untracked files alone. Output ONLY the git command, nothing else.
Expected
git clean -f src/
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 74 in → 473 out (465 reasoning)
git clean -f src/
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 76 in → 907 out (888 reasoning)
git clean -f src/temp.py
JSON Schema Structured Output
(raw) { "command": "git clean -f src/temp.py" }
deepseek/deepseek-v4-flash:none PASS 100% 74 in → 7 out (0 reasoning)
git clean -f src/
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 74 in → 14 out (0 reasoning)
git clean -f src/
JSON Schema Structured Output
(raw) { "command": "git clean -f src/" }
mistralai/devstral-2512__json_schema PASS 100% 73 in → 12 out
git clean -f src/
JSON Schema Structured Output
(raw) {"command": "git clean -f src/"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 86 in → 243 out (249 reasoning)
git clean -f -- src/
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 86 in → 506 out (515 reasoning)
git clean -f -- src/
JSON Schema Structured Output
(raw) { "command": "git clean -f -- src/" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 86 in → 8 out (0 reasoning)
git clean -fdx src/
poolside/laguna-xs.2:high PASS 100% 120 in → 136 out (127 reasoning)
git clean -f src/
poolside/laguna-xs.2:high__json_schema PASS 100% 120 in → 135 out (119 reasoning)
git clean -f src
JSON Schema Structured Output
(raw) { "command": "git clean -f src" }
poolside/laguna-xs.2:none PASS 100% 120 in → 8 out (0 reasoning)
git clean -f src/
poolside/laguna-xs.2:none__json_schema PASS 100% 120 in → 11 out (0 reasoning)
git clean -f src
JSON Schema Structured Output
(raw) {"command": "git clean -f src"}
mistralai/devstral-2512 FAIL 75% 73 in → 14 out
```bash git clean -f -d -- src/ ```
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'src/temp.py'}]
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 75% 86 in → 17 out (0 reasoning)
git clean -Xd 'src/'
JSON Schema Structured Output
(raw) { "command": "git clean -Xd 'src/'" }
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'src/temp.py'}]