: GitBench
List all tags
Tests ability to list all tags in a repository. Evaluates tag-enumeration command knowledge.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'v1'
  7. 07 git tag v1.0
  8. 08 echo 'v2' > app.txt
  9. 09 git add app.txt
  10. 10 git commit -m 'v2'
  11. 11 git tag v2.0
  12. 12 echo 'v3' > app.txt
  13. 13 git add app.txt
  14. 14 git commit -m 'v3'
  15. 15 git tag v3.0
Prompt
List all git tags in the repository. Output ONLY the git command, nothing else.
Expected
git tag -l
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 63 in → 67 out (64 reasoning)
git tag
deepseek/deepseek-v4-flash:none PASS 100% 62 in → 3 out (0 reasoning)
git tag
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 63 in → 11 out (0 reasoning)
git tag
JSON Schema Structured Output
(raw) { "command": "git tag" }
mistralai/devstral-2512 PASS 100% 68 in → 8 out
```bash git tag ```
mistralai/devstral-2512__json_schema PASS 100% 66 in → 8 out
git tag
JSON Schema Structured Output
(raw) {"command": "git tag"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 82 in → 206 out (196 reasoning)
git tag
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 80 in → 155 out (150 reasoning)
git tag
JSON Schema Structured Output
(raw) { "command": "git tag" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 78 in → 5 out (0 reasoning)
git tag -l
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 82 in → 15 out (0 reasoning)
git tag
JSON Schema Structured Output
(raw) { "command": "git tag" }
poolside/laguna-xs.2:high PASS 100% 116 in → 247 out (242 reasoning)
git tag
poolside/laguna-xs.2:high__json_schema PASS 100% 116 in → 245 out (236 reasoning)
git tag
JSON Schema Structured Output
(raw) {"command": "git tag"}
poolside/laguna-xs.2:none PASS 100% 118 in → 6 out (0 reasoning)
git tag -l
poolside/laguna-xs.2:none__json_schema PASS 100% 119 in → 10 out (0 reasoning)
git tag -l
JSON Schema Structured Output
(raw) {"command": "git tag -l"}
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)