: GitBench
Show lightweight tag pointing to a commit
Tests ability to inspect a lightweight tag pointing to a commit. Evaluates tag-to-commit resolution.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1 content' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'Release version 1.0'
  7. 07 git tag v1.0.0
  8. 08 echo 'v2 content' > app.txt
  9. 09 git add app.txt
  10. 10 git commit -m 'Release version 1.1'
  11. 11 git tag v1.1.0
Prompt
Using git show, what is the commit message of the commit that the lightweight tag 'v1.0.0' points to? Output ONLY the commit message, nothing else.
Expected
Release version 1.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 551 in → 106 out (98 reasoning)
Release version 1.0
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 567 in → 100 out (84 reasoning)
Release version 1.0
JSON Schema Structured Output
(raw) { "commit_message": "Release version 1.0" }
deepseek/deepseek-v4-flash:none PASS 100% 567 in → 7 out (0 reasoning)
Release version 1.0
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 563 in → 14 out (0 reasoning)
Release version 1.0
JSON Schema Structured Output
(raw) { "commit_message": "Release version 1.0" }
mistralai/devstral-2512 PASS 100% 680 in → 7 out
Release version 1.0
mistralai/devstral-2512__json_schema PASS 100% 676 in → 13 out
Release version 1.0
JSON Schema Structured Output
(raw) {"commit_message": "Release version 1.0"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 689 in → 137 out (109 reasoning)
Release version 1.0
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 701 in → 199 out (156 reasoning)
Release version 1.0
JSON Schema Structured Output
(raw) { "commit_message": "Release version 1.0" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 697 in → 7 out (0 reasoning)
Release version 1.0
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 705 in → 17 out (0 reasoning)
Release version 1.0
JSON Schema Structured Output
(raw) { "commit_message": " Release version 1.0" }
poolside/laguna-xs.2:high PASS 100% 735 in → 216 out (207 reasoning)
Release version 1.0
poolside/laguna-xs.2:high__json_schema PASS 100% 739 in → 428 out (414 reasoning)
Release version 1.0
JSON Schema Structured Output
(raw) {"commit_message": "Release version 1.0"}
poolside/laguna-xs.2:none PASS 100% 719 in → 8 out (0 reasoning)
Release version 1.0
poolside/laguna-xs.2:none__json_schema PASS 100% 733 in → 12 out (0 reasoning)
Release version 1.0
JSON Schema Structured Output
(raw) {"commit_message":"Release version 1.0"}