: GitBench
Identify specific commit from oneline output
Tests ability to identify a specific commit from oneline output. Evaluates parsing compact log formats.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1' > app.py
  5. 05 git add app.py
  6. 06 git commit -m 'Initial application setup'
  7. 07 echo 'v2' > app.py
  8. 08 git add app.py
  9. 09 git commit -m 'Add error handling'
  10. 10 echo 'v3' > app.py
  11. 11 git add app.py
  12. 12 git commit -m 'Fix null pointer bug'
  13. 13 echo 'v4' > app.py
  14. 14 git add app.py
  15. 15 git commit -m 'Refactor module structure'
  16. 16 echo 'v5' > app.py
  17. 17 git add app.py
  18. 18 git commit -m 'Add unit tests'
Prompt
Using git log --oneline, what is the short hash of the commit with message 'Fix null pointer bug'? Output ONLY the short hash, nothing else.
Expected
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 711 in → 156 out (151 reasoning)
6086e31
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 729 in → 80 out (66 reasoning)
4e71815
JSON Schema Structured Output
(raw) { "hash": "4e71815" }
deepseek/deepseek-v4-flash:none PASS 100% 705 in → 4 out (0 reasoning)
77868ea
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 731 in → 11 out (0 reasoning)
ab76424
JSON Schema Structured Output
(raw) { "hash": "ab76424" }
mistralai/devstral-2512 PASS 100% 921 in → 8 out
3a9256a
mistralai/devstral-2512__json_schema PASS 100% 910 in → 13 out
b0f3728
JSON Schema Structured Output
(raw) {"hash": "b0f3728"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 912 in → 195 out (175 reasoning)
e2914fd
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 928 in → 134 out (103 reasoning)
03d25e5
JSON Schema Structured Output
(raw) { "hash": "03d25e5" }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 929 in → 14 out (0 reasoning)
5fd5663
JSON Schema Structured Output
(raw) { "hash": "5fd5663" }
poolside/laguna-xs.2:high PASS 100% 949 in → 146 out (137 reasoning)
96eaa67
poolside/laguna-xs.2:high__json_schema PASS 100% 948 in → 173 out (159 reasoning)
511a7b5
JSON Schema Structured Output
(raw) {"hash": "511a7b5"}
poolside/laguna-xs.2:none PASS 100% 960 in → 8 out (0 reasoning)
8f6e3cf
poolside/laguna-xs.2:none__json_schema PASS 100% 966 in → 13 out (0 reasoning)
11586b6
JSON Schema Structured Output
(raw) {"hash": "11586b6"}
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 940 in → 39 out (0 reasoning)
d8227c5509a65cf00eaea16650628f575d14484f
Failure: Expected short hash d8227c5, got d8227c5509a65cf00eaea16650628f575d14484f