: GitBench
Show commit full hash
Tests ability to extract full commit hash from git show. Evaluates SHA-extraction from commit display.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'initial' > seed.txt
  5. 05 git add seed.txt
  6. 06 git commit -m 'Seed commit'
  7. 07 echo 'second' > seed.txt
  8. 08 git add seed.txt
  9. 09 git commit -m 'Second commit'
Prompt
Using git show -s --format=%H, what is the full SHA hash of the commit with message 'Second commit'? Output ONLY the full hash, nothing else.
Expected
full hash for commit message 'Second commit'
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 176 in → 114 out (86 reasoning)
8af78047a2f82f3ddd50321435b012528279928d
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 179 in → 97 out (56 reasoning)
c8871b6c0c84f1f9c73fe03016b92685fd330836
JSON Schema Structured Output
(raw) { "hash": "c8871b6c0c84f1f9c73fe03016b92685fd330836" }
deepseek/deepseek-v4-flash:none PASS 100% 174 in → 19 out (0 reasoning)
55305f542a347699719fb33f245441cf23b027fd
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 177 in → 29 out (0 reasoning)
6240343120a44484fb2a9465ed11ece048fe8b3a
JSON Schema Structured Output
(raw) { "hash": "6240343120a44484fb2a9465ed11ece048fe8b3a" }
mistralai/devstral-2512 PASS 100% 204 in → 36 out
2a90031b4d6489f2cdea8bd2f25b18985ada4a1f
mistralai/devstral-2512__json_schema PASS 100% 205 in → 43 out
ef072824bb13c84f703f7d4512a8b669749a6db9
JSON Schema Structured Output
(raw) {"hash": "ef072824bb13c84f703f7d4512a8b669749a6db9"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 219 in → 398 out (314 reasoning)
7a58396ed268d4b6b189efd33075694486ca537e
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 219 in → 276 out (143 reasoning)
6ba7c30cc598902d5707d9c44c782297c76bae62
JSON Schema Structured Output
(raw) {"hash": "6ba7c30cc598902d5707d9c44c782297c76bae62" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 217 in → 44 out (0 reasoning)
Input the SHA hash manually (f365b0cf95452f3b33e590cbacbea341f816a12c)
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 219 in → 51 out (0 reasoning)
fb91c68d366f481e1504f2c6e61f3521ee203660
JSON Schema Structured Output
(raw) { "hash": "fb91c68d366f481e1504f2c6e61f3521ee203660" }
poolside/laguna-xs.2:high PASS 100% 255 in → 169 out (128 reasoning)
2107e498655e600f1dff60490c451b6e758b4948
poolside/laguna-xs.2:high__json_schema PASS 100% 248 in → 133 out (95 reasoning)
f3664f64cc4d61a68f9ef8f53def6affdc8ae4d4
JSON Schema Structured Output
(raw) {"hash":"f3664f64cc4d61a68f9ef8f53def6affdc8ae4d4"}
poolside/laguna-xs.2:none PASS 100% 254 in → 39 out (0 reasoning)
827fb8f8bc42963d09707ad6900f914a617e5926
poolside/laguna-xs.2:none__json_schema PASS 100% 251 in → 40 out (0 reasoning)
ba3183c07b992bde6f3d52782918fa244dc633b9
JSON Schema Structured Output
(raw) {"hash":"ba3183c07b992bde6f3d52782918fa244dc633b9"}