: GitBench
Show commit author date
Tests ability to extract commit author date from git show. Evaluates date-field extraction from commit metadata.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'dev@org.io'
  3. 03 git config user.name 'Developer'
  4. 04 git config commit.gpgsign false
  5. 05 echo 'data' > config.yaml
  6. 06 git add config.yaml
  7. 07 GIT_AUTHOR_DATE='2025-06-15T10:30:00' GIT_COMMITTER_DATE='2025-06-15T10:30:00' git commit -m 'Mid-year update'
Prompt
Using git show -s --format=%ae, what is the author email of the latest commit? Output ONLY the email, nothing else.
Expected
dev@org.io
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 162 in → 65 out (59 reasoning)
dev@org.io
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 164 in → 99 out (86 reasoning)
dev@org.io
JSON Schema Structured Output
(raw) { "email": "dev@org.io"}
deepseek/deepseek-v4-flash:none PASS 100% 162 in → 5 out (0 reasoning)
dev@org.io
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 162 in → 12 out (0 reasoning)
dev@org.io
JSON Schema Structured Output
(raw) { "email": "dev@org.io" }
mistralai/devstral-2512 PASS 100% 192 in → 5 out
dev@org.io
mistralai/devstral-2512__json_schema PASS 100% 192 in → 10 out
dev@org.io
JSON Schema Structured Output
(raw) {"email": "dev@org.io"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 205 in → 66 out (59 reasoning)
dev@org.io
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 205 in → 99 out (95 reasoning)
dev@org.io
JSON Schema Structured Output
(raw) { "email": "dev@org.io" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 205 in → 5 out (0 reasoning)
dev@org.io
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 205 in → 13 out (0 reasoning)
dev@org.io
JSON Schema Structured Output
(raw) { "email": "dev@org.io" }
poolside/laguna-xs.2:high PASS 100% 245 in → 171 out (165 reasoning)
dev@org.io
poolside/laguna-xs.2:high__json_schema PASS 100% 245 in → 232 out (222 reasoning)
dev@org.io
JSON Schema Structured Output
(raw) {"email":"dev@org.io"}
poolside/laguna-xs.2:none PASS 100% 245 in → 6 out (0 reasoning)
dev@org.io
poolside/laguna-xs.2:none__json_schema PASS 100% 245 in → 10 out (0 reasoning)
dev@org.io
JSON Schema Structured Output
(raw) {"email": "dev@org.io"}