: GitBench
Show commit details including author and message
Tests ability to extract author email from git show output. Evaluates basic commit-inspection comprehension.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'alice@example.com'
  3. 03 git config user.name 'Alice'
  4. 04 echo 'hello' > readme.txt
  5. 05 git add readme.txt
  6. 06 git commit -m 'Add readme file'
  7. 07 echo 'line2' >> readme.txt
  8. 08 git add readme.txt
  9. 09 git commit -m 'Update readme with more content'
Prompt
Using git show, what is the author email of the latest commit? Output ONLY the email address, nothing else.
Expected
alice@example.com
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 173 in → 74 out (67 reasoning)
alice@example.com
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 229 in → 72 out (50 reasoning)
alice@example.com
JSON Schema Structured Output
(raw) {"email": "alice@example.com"}
deepseek/deepseek-v4-flash:none PASS 100% 167 in → 6 out (0 reasoning)
alice@example.com
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 171 in → 12 out (0 reasoning)
alice@example.com
JSON Schema Structured Output
(raw) { "email": "alice@example.com" }
mistralai/devstral-2512 PASS 100% 195 in → 5 out
alice@example.com
mistralai/devstral-2512__json_schema PASS 100% 200 in → 10 out
alice@example.com
JSON Schema Structured Output
(raw) {"email": "alice@example.com"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 208 in → 67 out (69 reasoning)
alice@example.com
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 208 in → 86 out (86 reasoning)
alice@example.com
JSON Schema Structured Output
(raw) { "email": "alice@example.com" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 207 in → 5 out (0 reasoning)
alice@example.com
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 209 in → 12 out (0 reasoning)
alice@example.com
JSON Schema Structured Output
(raw) { "email": "alice@example.com" }
poolside/laguna-xs.2:high__json_schema PASS 100% 243 in → 186 out (177 reasoning)
alice@example.com
JSON Schema Structured Output
(raw) {"email": "alice@example.com"}
poolside/laguna-xs.2:none PASS 100% 243 in → 5 out (0 reasoning)
alice@example.com
poolside/laguna-xs.2:none__json_schema PASS 100% 246 in → 9 out (0 reasoning)
alice@example.com
JSON Schema Structured Output
(raw) {"email": "alice@example.com"}
poolside/laguna-xs.2:high FAIL 0% 239 in → 225 out (176 reasoning)
I'll use git show to get the author email of the latest commit. <tool_call>run_command <arg_key>command</arg_key> <arg_value>git show -s --format='%ae'</arg_value> </tool_call>
Failure: Expected 'alice@example.com', got ' I'll use git show to get the author email of the latest commit. <tool_call>run_command <arg_key>command</arg_key> <arg_value>git show -s --format='%ae'</arg_value> </tool_call> '