: GitBench
Show tag details
Tests ability to show detailed tag information. Evaluates tag-inspection with git show.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'App' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'Release candidate'
  7. 07 git tag -a v1.0.0-rc1 -m 'Release candidate 1'
Prompt
Show the details (message, tagger, date) of the tag 'v1.0.0-rc1'. Output ONLY the git command, nothing else.
Expected
git show v1.0.0-rc1
Loading campaign evidence…
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 60 in → 124 out (107 reasoning)
git show v1.0.0-rc1
JSON Schema Structured Output
(raw) {"command":"git show v1.0.0-rc1"}
mistralai/devstral-2512 PASS 100% 60 in → 33 out
```bash git show v1.0.0-rc1 --format="%H %an %ad %s" --no-patch ```
mistralai/devstral-2512__json_schema PASS 100% 61 in → 17 out
git show v1.0.0-rc1
JSON Schema Structured Output
(raw) {"command": "git show v1.0.0-rc1"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 74 in → 1,367 out (1,171 reasoning)
git show v1.0.0-rc1
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 74 in → 657 out (540 reasoning)
git show v1.0.0-rc1 --pretty='%b%nTagger: %an%nDate: %ad'
JSON Schema Structured Output
(raw) { "command": "git show v1.0.0-rc1 --pretty='%b%nTagger: %an%nDate: %ad'" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 72 in → 33 out (0 reasoning)
git show --decorate=v --pretty="%b%x09%an%x09%ci" v1.0.0-rc1
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 74 in → 49 out (0 reasoning)
git show v1.0.0-rc1 --pretty=format:'Message: %s\nTagger: %an\nDate: %ai' --date=iso-strict
JSON Schema Structured Output
(raw) { "command": "git show v1.0.0-rc1 --pretty=format:'Message: %s\\nTagger: %an\\nDate: %ai' --date=iso-strict" }
poolside/laguna-xs.2:high PASS 100% 109 in → 160 out (147 reasoning)
git show v1.0.0-rc1
poolside/laguna-xs.2:high__json_schema PASS 100% 109 in → 117 out (100 reasoning)
git show v1.0.0-rc1
JSON Schema Structured Output
(raw) {"command": "git show v1.0.0-rc1"}
poolside/laguna-xs.2:none PASS 100% 109 in → 23 out (0 reasoning)
git show v1.0.0-rc1 --format="%s%n%an%n%ad"
poolside/laguna-xs.2:none__json_schema PASS 100% 108 in → 38 out (0 reasoning)
git show v1.0.0-rc1 --format="%(message)%n%(taggername)%n%(date)%n" --no-show-signature
JSON Schema Structured Output
(raw) {"command": "git show v1.0.0-rc1 --format=\"%(message)%n%(taggername)%n%(date)%n\" --no-show-signature"}
deepseek/deepseek-v4-flash:high FAIL 66.7% 59 in → 351 out (260 reasoning)
git tag -l v1.0.0-rc1 --format='%(contents:subject) %(taggername) %(taggerdate)'
Failure: Failed assertions: [{'type': 'model_output', 'contains': 'git show'}]
deepseek/deepseek-v4-flash:none FAIL 66.7% 60 in → 45 out (0 reasoning)
git log --format="%h %s%nAuthor: %an <%ae>%nDate: %ad" --tags="v1.0.0-rc1" --all | head -4
Failure: Failed assertions: [{'type': 'model_output', 'contains': 'git show'}]
deepseek/deepseek-v4-flash:none__json_schema FAIL 66.7% 59 in → 47 out (0 reasoning)
git tag -l v1.0.0-rc1 --format='%(refname:short) %(taggerdate:iso8601) %(taggername)'
JSON Schema Structured Output
(raw) { "command": "git tag -l v1.0.0-rc1 --format='%(refname:short) %(taggerdate:iso8601) %(taggername)'" }
Failure: Failed assertions: [{'type': 'model_output', 'contains': 'git show'}]