: GitBench
Show binary file change in commit
Tests ability to inspect a binary file change in a commit. Evaluates binary-change handling in commit display.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf '‰PNG  binary' > image.png
  5. 05 git add image.png
  6. 06 git commit -m 'Add binary image'
  7. 07 printf '‰PNG  updated' > image.png
  8. 08 git add image.png
  9. 09 git commit -m 'Update binary image'
Prompt
Using git show --stat, which file was changed in the 'Update binary image' commit? Output ONLY the filename, nothing else.
Expected
image.png
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 195 in → 49 out (47 reasoning)
image.png
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 201 in → 74 out (62 reasoning)
image.png
JSON Schema Structured Output
(raw) { "filename": "image.png" }
deepseek/deepseek-v4-flash:none PASS 100% 197 in → 3 out (0 reasoning)
image.png
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 200 in → 10 out (0 reasoning)
image.png
JSON Schema Structured Output
(raw) { "filename": "image.png" }
mistralai/devstral-2512 PASS 100% 222 in → 3 out
image.png
mistralai/devstral-2512__json_schema PASS 100% 222 in → 8 out
image.png
JSON Schema Structured Output
(raw) {"filename": "image.png"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 234 in → 73 out (74 reasoning)
image.png
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 234 in → 56 out (48 reasoning)
image.png
JSON Schema Structured Output
(raw) { "filename": "image.png" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 235 in → 3 out (0 reasoning)
image.png
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 233 in → 12 out (0 reasoning)
image.png
JSON Schema Structured Output
(raw) { "filename": "image.png" }
poolside/laguna-xs.2:high PASS 100% 270 in → 106 out (101 reasoning)
image.png
poolside/laguna-xs.2:high__json_schema PASS 100% 265 in → 134 out (125 reasoning)
image.png
JSON Schema Structured Output
(raw) {"filename": "image.png"}
poolside/laguna-xs.2:none PASS 100% 270 in → 4 out (0 reasoning)
image.png
poolside/laguna-xs.2:none__json_schema PASS 100% 271 in → 8 out (0 reasoning)
image.png
JSON Schema Structured Output
(raw) {"filename": "image.png"}