: GitBench
Count commits matching a grep pattern
Tests ability to count commits matching a grep pattern in git log. Evaluates log filtering and counting.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'hello' > file.txt
  5. 05 git add file.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 echo 'world' > file.txt
  8. 08 git add file.txt
  9. 09 git commit -m 'Fix: update greeting message'
  10. 10 echo 'foo' > file.txt
  11. 11 git add file.txt
  12. 12 git commit -m 'Add feature bar'
  13. 13 echo 'baz' > file.txt
  14. 14 git add file.txt
  15. 15 git commit -m 'Fix: resolve issue with greeting'
  16. 16 echo 'qux' > file.txt
  17. 17 git add file.txt
  18. 18 git commit -m 'Update documentation'
Prompt
How many commits in this repository have 'Fix' in their commit message? Output ONLY the number, nothing else.
Expected
2
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 705 in → 44 out (41 reasoning)
2
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 702 in → 85 out (74 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
deepseek/deepseek-v4-flash:none PASS 100% 702 in → 1 out (0 reasoning)
2
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 712 in → 9 out (0 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
mistralai/devstral-2512 PASS 100% 914 in → 2 out
2
mistralai/devstral-2512__json_schema PASS 100% 903 in → 7 out
2
JSON Schema Structured Output
(raw) {"count": 2}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 920 in → 120 out (128 reasoning)
2
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 935 in → 161 out (175 reasoning)
2
JSON Schema Structured Output
(raw) {"count": 2}
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 929 in → 2 out (0 reasoning)
2
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 913 in → 10 out (0 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
poolside/laguna-xs.2:high PASS 100% 947 in → 115 out (111 reasoning)
2
poolside/laguna-xs.2:high__json_schema PASS 100% 960 in → 420 out (408 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
poolside/laguna-xs.2:none PASS 100% 954 in → 3 out (0 reasoning)
2
poolside/laguna-xs.2:none__json_schema PASS 100% 947 in → 11 out (0 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }