: GitBench
List tags matching a pattern
Tests ability to list tags matching a glob pattern. Evaluates tag-filtering with pattern matching.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'v1'
  7. 07 git tag v1.0
  8. 08 echo 'v2' > app.txt
  9. 09 git add app.txt
  10. 10 git commit -m 'v2'
  11. 11 git tag v2.0
  12. 12 echo 'beta' > app.txt
  13. 13 git add app.txt
  14. 14 git commit -m 'beta'
  15. 15 git tag beta-1
  16. 16 echo 'rc' > app.txt
  17. 17 git add app.txt
  18. 18 git commit -m 'rc'
  19. 19 git tag rc-1
Prompt
List only git tags matching the pattern 'v*'. Output ONLY the git command, nothing else.
Expected
git tag -l 'v*'
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 75 in → 77 out (64 reasoning)
git tag -l 'v*'
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 137 in → 95 out (78 reasoning)
git tag -l 'v*'
JSON Schema Structured Output
(raw) {"command": "git tag -l 'v*'"}
deepseek/deepseek-v4-flash:none PASS 100% 75 in → 9 out (0 reasoning)
git tag -l 'v*'
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 72 in → 17 out (0 reasoning)
git tag --list 'v*'
JSON Schema Structured Output
(raw) { "command": "git tag --list 'v*'" }
mistralai/devstral-2512 PASS 100% 79 in → 13 out
```bash git tag -l 'v*' ```
mistralai/devstral-2512__json_schema PASS 100% 80 in → 13 out
git tag -l 'v*'
JSON Schema Structured Output
(raw) {"command": "git tag -l 'v*'"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 93 in → 247 out (228 reasoning)
git tag --list 'v*'
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 95 in → 267 out (246 reasoning)
git tag -l 'v*'
JSON Schema Structured Output
(raw) { "command": "git tag -l 'v*'" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 92 in → 8 out (0 reasoning)
git tag --list 'v*'
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 94 in → 16 out (0 reasoning)
git tag --list 'v*'
JSON Schema Structured Output
(raw) { "command": "git tag --list 'v*'" }
poolside/laguna-xs.2:high PASS 100% 130 in → 217 out (207 reasoning)
git tag -l "v*"
poolside/laguna-xs.2:high__json_schema PASS 100% 132 in → 171 out (152 reasoning)
git tag -l 'v*'
JSON Schema Structured Output
(raw) { "command": "git tag -l 'v*'" }
poolside/laguna-xs.2:none PASS 100% 130 in → 9 out (0 reasoning)
git tag -l 'v*'
poolside/laguna-xs.2:none__json_schema PASS 100% 131 in → 13 out (0 reasoning)
git tag -l 'v*'
JSON Schema Structured Output
(raw) {"command": "git tag -l 'v*'"}