: GitBench
Show file rename detection in commit
Tests ability to detect file rename in a commit. Evaluates rename-detection awareness in commit inspection.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'content' > old_name.txt
  5. 05 git add old_name.txt
  6. 06 git commit -m 'Add original file'
  7. 07 git mv old_name.txt new_name.txt
  8. 08 git commit -m 'Rename file'
Prompt
Using git show --name-status, what is the status code for the file rename in the 'Rename file' commit? Output ONLY the single letter code (e.g. R for rename), nothing else.
Expected
R
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 168 in → 115 out (116 reasoning)
R
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 168 in → 159 out (147 reasoning)
R
JSON Schema Structured Output
(raw) { "status_code": "R" }
deepseek/deepseek-v4-flash:none PASS 100% 169 in → 2 out (0 reasoning)
R
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 170 in → 9 out (0 reasoning)
R
JSON Schema Structured Output
(raw) { "status_code": "R" }
mistralai/devstral-2512 PASS 100% 187 in → 2 out
R
mistralai/devstral-2512__json_schema PASS 100% 187 in → 8 out
R
JSON Schema Structured Output
(raw) {"status_code": "R"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 206 in → 199 out (201 reasoning)
R
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 198 in → 99 out (99 reasoning)
R
JSON Schema Structured Output
(raw) { "status_code": "R" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 204 in → 2 out (0 reasoning)
R
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 206 in → 11 out (0 reasoning)
R
JSON Schema Structured Output
(raw) { "status_code": "R" }
poolside/laguna-xs.2:high PASS 100% 236 in → 200 out (196 reasoning)
R
poolside/laguna-xs.2:high__json_schema PASS 100% 239 in → 217 out (208 reasoning)
R
JSON Schema Structured Output
(raw) {"status_code": "R"}
poolside/laguna-xs.2:none PASS 100% 239 in → 3 out (0 reasoning)
R
poolside/laguna-xs.2:none__json_schema PASS 100% 238 in → 8 out (0 reasoning)
R
JSON Schema Structured Output
(raw) {"status_code": "R"}