: GitBench
Dry-run preview of untracked file removal
Tests ability to preview untracked file removal with git clean -n (dry-run). Evaluates safe-cleanup workflow.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'tracked' > tracked.txt
  5. 05 git add tracked.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 echo 'untracked' > untracked.txt
  8. 08 echo 'another' > another.txt
Prompt
Show what files would be removed by git clean without actually removing them. Output ONLY the git command, nothing else.
Expected
git clean -f -n
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 59 in → 223 out (217 reasoning)
git clean -n
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 61 in → 127 out (113 reasoning)
git clean -n
JSON Schema Structured Output
(raw) { "command": "git clean -n" }
deepseek/deepseek-v4-flash:none PASS 100% 59 in → 4 out (0 reasoning)
git clean -n
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 61 in → 12 out (0 reasoning)
git clean -n
JSON Schema Structured Output
(raw) { "command": "git clean -n" }
mistralai/devstral-2512 PASS 100% 58 in → 10 out
```sh git clean -n ```
mistralai/devstral-2512__json_schema PASS 100% 58 in → 10 out
git clean -n
JSON Schema Structured Output
(raw) {"command": "git clean -n"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 71 in → 184 out (193 reasoning)
git clean -n
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 71 in → 432 out (472 reasoning)
git clean -n
JSON Schema Structured Output
(raw) { "command": "git clean -n" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 71 in → 5 out (0 reasoning)
git clean -nx
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 71 in → 10 out (0 reasoning)
git clean -n
JSON Schema Structured Output
(raw) {"command": "git clean -n"}
poolside/laguna-xs.2:high PASS 100% 107 in → 141 out (134 reasoning)
git clean -n
poolside/laguna-xs.2:high__json_schema PASS 100% 107 in → 146 out (135 reasoning)
git clean -n
JSON Schema Structured Output
(raw) {"command": "git clean -n"}
poolside/laguna-xs.2:none PASS 100% 107 in → 6 out (0 reasoning)
git clean -n
poolside/laguna-xs.2:none__json_schema PASS 100% 107 in → 10 out (0 reasoning)
git clean -n
JSON Schema Structured Output
(raw) {"command": "git clean -n"}