: GitBench
Delete a tag
Tests ability to delete a tag. Evaluates tag-deletion command knowledge.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'App' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 git tag v0.1-beta
  8. 08 echo 'Release' > release.txt
  9. 09 git add release.txt
  10. 10 git commit -m 'Release v1.0'
  11. 11 git tag v1.0
Prompt
Delete the 'v0.1-beta' tag. Output ONLY the git command, nothing else.
Expected
git tag -d v0.1-beta
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 58 in → 87 out (76 reasoning)
git tag -d v0.1-beta
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 120 in → 80 out (58 reasoning)
git tag -d v0.1-beta
JSON Schema Structured Output
(raw) {"command": "git tag -d v0.1-beta"}
deepseek/deepseek-v4-flash:none PASS 100% 56 in → 10 out (0 reasoning)
git tag -d v0.1-beta
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 55 in → 18 out (0 reasoning)
git tag -d v0.1-beta
JSON Schema Structured Output
(raw) { "command": "git tag -d v0.1-beta" }
mistralai/devstral-2512__json_schema PASS 100% 59 in → 15 out
git tag -d v0.1-beta
JSON Schema Structured Output
(raw) {"command": "git tag -d v0.1-beta"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 74 in → 186 out (168 reasoning)
git tag -d v0.1-beta
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 72 in → 123 out (98 reasoning)
git tag -d v0.1-beta
JSON Schema Structured Output
(raw) { "command": "git tag -d v0.1-beta" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 73 in → 10 out (0 reasoning)
git tag -d v0.1-beta
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 73 in → 29 out (0 reasoning)
git tag -d v0.1-beta && git push origin --delete v0.1-beta
JSON Schema Structured Output
(raw) { "command": "git tag -d v0.1-beta && git push origin --delete v0.1-beta" }
poolside/laguna-xs.2:high PASS 100% 111 in → 275 out (265 reasoning)
git tag -d v0.1-beta
poolside/laguna-xs.2:high__json_schema PASS 100% 109 in → 174 out (141 reasoning)
git tag -d v0.1-beta && git push origin :refs/tags/v0.1-beta
JSON Schema Structured Output
(raw) { "command": "git tag -d v0.1-beta && git push origin :refs/tags/v0.1-beta" }
mistralai/devstral-2512 FAIL 50% 60 in → 15 out
```bash git tag -d v0.1-beta ```
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v0.1-beta'}]
poolside/laguna-xs.2:none FAIL 50% 109 in → 12 out (0 reasoning)
git push --delete origin v0.1-beta
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v0.1-beta'}]
poolside/laguna-xs.2:none__json_schema FAIL 50% 111 in → 16 out (0 reasoning)
git push --delete origin v0.1-beta
JSON Schema Structured Output
(raw) {"command": "git push --delete origin v0.1-beta"}
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v0.1-beta'}]