: GitBench
Identify most changed file from stat output
Tests ability to identify the most-changed file from stat output. Evaluates parsing git log --stat for change frequency.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf 'line1 line2 line3 ' > small.txt
  5. 05 printf 'a b ' > medium.txt
  6. 06 printf 'x y z w v u t s r ' > large.txt
  7. 07 git add small.txt medium.txt large.txt
  8. 08 git commit -m 'Initial commit with three files'
  9. 09 printf 'line1 line2 line3 line4 line5 ' > small.txt
  10. 10 printf 'a b c d e f g h i j k l m n o ' > medium.txt
  11. 11 printf 'x y z w v u t s r q p o n m l k j i h g f e d c b a ' > large.txt
  12. 12 git add small.txt medium.txt large.txt
  13. 13 git commit -m 'Update all files'
Prompt
In the most recent commit, which file had the most lines changed according to git log --stat? Output ONLY the filename, nothing else.
Expected
large.txt
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 349 in → 68 out (64 reasoning)
large.txt
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 354 in → 110 out (101 reasoning)
large.txt
JSON Schema Structured Output
(raw) { "filename": "large.txt" }
deepseek/deepseek-v4-flash:none PASS 100% 345 in → 3 out (0 reasoning)
large.txt
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 346 in → 12 out (0 reasoning)
large.txt
JSON Schema Structured Output
(raw) { "filename": "large.txt" }
mistralai/devstral-2512 PASS 100% 448 in → 3 out
large.txt
mistralai/devstral-2512__json_schema PASS 100% 450 in → 8 out
large.txt
JSON Schema Structured Output
(raw) {"filename": "large.txt"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 458 in → 153 out (138 reasoning)
large.txt
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 456 in → 122 out (108 reasoning)
large.txt
JSON Schema Structured Output
(raw) { "filename": "large.txt" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 458 in → 3 out (0 reasoning)
large.txt
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 452 in → 10 out (0 reasoning)
large.txt
JSON Schema Structured Output
(raw) { "filename": "large.txt" }
poolside/laguna-xs.2:high PASS 100% 483 in → 124 out (119 reasoning)
large.txt
poolside/laguna-xs.2:high__json_schema PASS 100% 473 in → 254 out (246 reasoning)
large.txt
JSON Schema Structured Output
(raw) {"filename": "large.txt"}
poolside/laguna-xs.2:none PASS 100% 477 in → 4 out (0 reasoning)
large.txt
poolside/laguna-xs.2:none__json_schema PASS 100% 482 in → 13 out (0 reasoning)
large.txt
JSON Schema Structured Output
(raw) { "filename": "large.txt" }