: GitBench
Remove a single untracked file
Tests ability to remove a single untracked file with git clean -f. Evaluates basic working-directory cleanup.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'tracked' > tracked.txt
  5. 05 git add tracked.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 echo 'untracked' > untracked.txt
Prompt
Remove the untracked file from the working directory. Output ONLY the git command, nothing else.
Expected
git clean -f
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 49 in → 277 out (268 reasoning)
git clean -f untracked.txt
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 49 in → 231 out (215 reasoning)
git clean -f untracked.txt
JSON Schema Structured Output
(raw) { "command": "git clean -f untracked.txt" }
deepseek/deepseek-v4-flash:none PASS 100% 49 in → 5 out (0 reasoning)
git clean -f
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 49 in → 17 out (0 reasoning)
git clean -f untracked.txt
JSON Schema Structured Output
(raw) { "command": "git clean -f untracked.txt" }
mistralai/devstral-2512__json_schema PASS 100% 48 in → 14 out
git clean -f untracked.txt
JSON Schema Structured Output
(raw) {"command": "git clean -f untracked.txt"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 61 in → 208 out (190 reasoning)
git clean -f untracked.txt
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 61 in → 155 out (139 reasoning)
git clean -f untracked.txt
JSON Schema Structured Output
(raw) { "command": "git clean -f untracked.txt" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 61 in → 9 out (0 reasoning)
git clean -f untracked.txt
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 61 in → 13 out (0 reasoning)
git clean -fd
JSON Schema Structured Output
(raw) { "command": "git clean -fd" }
poolside/laguna-xs.2:high PASS 100% 96 in → 97 out (87 reasoning)
git clean -f untracked.txt
poolside/laguna-xs.2:high__json_schema PASS 100% 96 in → 133 out (119 reasoning)
git clean -f untracked.txt
JSON Schema Structured Output
(raw) {"command": "git clean -f untracked.txt"}
poolside/laguna-xs.2:none PASS 100% 96 in → 9 out (0 reasoning)
git clean -f untracked.txt
poolside/laguna-xs.2:none__json_schema PASS 100% 96 in → 15 out (0 reasoning)
rm untracked.txt
JSON Schema Structured Output
(raw) { "command": "rm untracked.txt" }
mistralai/devstral-2512 FAIL 50% 48 in → 14 out
```bash git clean -f untracked.txt ```
Failure: Failed assertions: [{'type': 'file_not_exists', 'path': 'untracked.txt'}]