: GitBench
Show which files changed in a specific commit
Tests ability to identify which files changed in a specific commit. Evaluates file-list extraction from git show.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'alpha' > a.txt
  5. 05 echo 'beta' > b.txt
  6. 06 git add a.txt b.txt
  7. 07 git commit -m 'Add both files'
  8. 08 echo 'changed' > a.txt
  9. 09 git add a.txt
  10. 10 git commit -m 'Modify only a.txt'
  11. 11 echo 'changed' >> b.txt
  12. 12 git add b.txt
  13. 13 git commit -m 'Modify only b.txt'
Prompt
Using git show --stat, which file was modified in the commit with message 'Modify only a.txt'? Output ONLY the filename, nothing else.
Expected
a.txt
Loading campaign evidence…
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 228 in → 433 out (461 reasoning)
a.txt
JSON Schema Structured Output
(raw) {"filename": "a.txt"}
deepseek/deepseek-v4-flash:none PASS 100% 168 in → 3 out (0 reasoning)
a.txt
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 169 in → 9 out (0 reasoning)
a.txt
JSON Schema Structured Output
(raw) { "filename": "a.txt" }
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 206 in → 653 out (634 reasoning)
a.txt
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 207 in → 285 out (315 reasoning)
a.txt
JSON Schema Structured Output
(raw) { "filename": "a.txt" }
deepseek/deepseek-v4-flash:high FAIL 0% 165 in → 1,315 out (1,391 reasoning)
b.txt
Failure: Expected 'a.txt', got 'b.txt'
mistralai/devstral-2512 FAIL 0% 193 in → 106 out
The commit with the message "Modify only a.txt" is not shown in the provided output. The only commit shown is for "Modify only b.txt" which modified `b.txt`. If you're looking for the commit with message "Modify only a.txt", you would need to check the full git history. Based on the information given, there is no commit with that message in the provided output. However, if you're asking about the commit shown (which modifies b.txt), the file modified is: `b.txt`
Failure: Expected 'a.txt', got 'The commit with the message "Modify only a.txt" is not shown in the provided output. The only commit shown is for "Modify only b.txt" which modified `b.txt`. If you're looking for the commit with message "Modify only a.txt", you would need to check the full git history. Based on the information given, there is no commit with that message in the provided output. However, if you're asking about the commit shown (which modifies b.txt), the file modified is: `b.txt`'
b.txt
JSON Schema Structured Output
(raw) {"filename": "b.txt"}
Failure: Expected 'a.txt', got 'b.txt'
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 206 in → 3 out (0 reasoning)
b.txt
Failure: Expected 'a.txt', got 'b.txt'
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 0% 205 in → 11 out (0 reasoning)
b.txt
JSON Schema Structured Output
(raw) { "filename": "b.txt" }
Failure: Expected 'a.txt', got 'b.txt'
poolside/laguna-xs.2:high FAIL 0% 243 in → 678 out (636 reasoning)
I don't see a commit with message 'Modify only a.txt' in the provided information. The commit shown has the message 'Modify only b.txt' and modified the file `b.txt`.
Failure: Expected 'a.txt', got ' I don't see a commit with message 'Modify only a.txt' in the provided information. The commit shown has the message 'Modify only b.txt' and modified the file `b.txt`. '
poolside/laguna-xs.2:high__json_schema FAIL 0% 244 in → 548 out (534 reasoning)
b.txt
JSON Schema Structured Output
(raw) { "filename": "b.txt" }
Failure: Expected 'a.txt', got 'b.txt'
poolside/laguna-xs.2:none FAIL 0% 245 in → 41 out (0 reasoning)
I don't see a commit with the message 'Modify only a.txt' in the provided commits. The only commit shown has the message 'Modify only b.txt' and modified file b.txt.
Failure: Expected 'a.txt', got ' I don't see a commit with the message 'Modify only a.txt' in the provided commits. The only commit shown has the message 'Modify only b.txt' and modified file b.txt. '
poolside/laguna-xs.2:none__json_schema FAIL 0% 245 in → 8 out (0 reasoning)
b.txt
JSON Schema Structured Output
(raw) {"filename": "b.txt"}
Failure: Expected 'a.txt', got 'b.txt'