: GitBench
Show annotated tag with tagger info and message
Tests ability to inspect an annotated tag with tagger info and message. Evaluates annotated-tag comprehension.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'releaser@company.com'
  3. 03 git config user.name 'Release Bot'
  4. 04 echo 'stable code' > main.py
  5. 05 git add main.py
  6. 06 git commit -m 'Stable release'
  7. 07 git tag -a v2.0.0 -m 'Major release: stable API'
Prompt
Using git show, what is the tagger email for the annotated tag 'v2.0.0'? Output ONLY the email, nothing else.
Expected
releaser@company.com
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 388 in → 63 out (49 reasoning)
releaser@company.com
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 390 in → 62 out (50 reasoning)
releaser@company.com
JSON Schema Structured Output
(raw) {"email":"releaser@company.com"}
deepseek/deepseek-v4-flash:none PASS 100% 384 in → 7 out (0 reasoning)
releaser@company.com
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 386 in → 17 out (0 reasoning)
releaser@company.com
JSON Schema Structured Output
(raw) { "email": "releaser@company.com" }
mistralai/devstral-2512 PASS 100% 463 in → 7 out
releaser@company.com
mistralai/devstral-2512__json_schema PASS 100% 461 in → 12 out
releaser@company.com
JSON Schema Structured Output
(raw) {"email": "releaser@company.com"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 482 in → 75 out (61 reasoning)
releaser@company.com
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 480 in → 123 out (106 reasoning)
releaser@company.com
JSON Schema Structured Output
(raw) { "email": "releaser@company.com" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 472 in → 7 out (0 reasoning)
releaser@company.com
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 476 in → 14 out (0 reasoning)
releaser@company.com
JSON Schema Structured Output
(raw) { "email": "releaser@company.com" }
poolside/laguna-xs.2:high PASS 100% 521 in → 211 out (203 reasoning)
releaser@company.com
poolside/laguna-xs.2:high__json_schema PASS 100% 507 in → 233 out (222 reasoning)
releaser@company.com
JSON Schema Structured Output
(raw) {"email": "releaser@company.com"}
poolside/laguna-xs.2:none PASS 100% 521 in → 8 out (0 reasoning)
releaser@company.com
poolside/laguna-xs.2:none__json_schema PASS 100% 513 in → 12 out (0 reasoning)
releaser@company.com
JSON Schema Structured Output
(raw) {"email": "releaser@company.com"}