: GitBench
Count total lines changed in a commit from stat output
Tests ability to count total lines changed in a commit from stat output. Evaluates quantitative stat interpretation.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'start' > readme.txt
  5. 05 git add readme.txt
  6. 06 git commit -m 'Add readme'
  7. 07 printf 'line1 line2 line3 ' > code.py
  8. 08 printf 'a b c d e ' > docs.md
  9. 09 git add code.py docs.md
  10. 10 git commit -m 'Add code and docs'
Prompt
In the commit with message 'Add code and docs', how many total insertions were made according to git log --stat? Output ONLY the number, nothing else.
Expected
8
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 318 in → 46 out (44 reasoning)
8
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 324 in → 63 out (55 reasoning)
8
JSON Schema Structured Output
(raw) { "count": 8 }
deepseek/deepseek-v4-flash:none PASS 100% 319 in → 2 out (0 reasoning)
8
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 318 in → 7 out (0 reasoning)
8
JSON Schema Structured Output
(raw) {"count": 8}
mistralai/devstral-2512 PASS 100% 397 in → 2 out
8
mistralai/devstral-2512__json_schema PASS 100% 401 in → 7 out
8
JSON Schema Structured Output
(raw) {"count": 8}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 418 in → 211 out (166 reasoning)
8
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 418 in → 111 out (90 reasoning)
8
JSON Schema Structured Output
(raw) { "count": 8 }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 415 in → 2 out (0 reasoning)
8
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 414 in → 8 out (0 reasoning)
8
JSON Schema Structured Output
(raw) { "count": 8 }
poolside/laguna-xs.2:high PASS 100% 452 in → 204 out (201 reasoning)
8
poolside/laguna-xs.2:high__json_schema PASS 100% 448 in → 202 out (193 reasoning)
8
JSON Schema Structured Output
(raw) {"count": 8}
poolside/laguna-xs.2:none PASS 100% 449 in → 3 out (0 reasoning)
8
poolside/laguna-xs.2:none__json_schema PASS 100% 441 in → 7 out (0 reasoning)
8
JSON Schema Structured Output
(raw) {"count": 8}