: GitBench
List tags sorted by version
Tests ability to list tags sorted by version order. Evaluates version-sorted tag enumeration.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'v1'
  7. 07 git tag v1.0
  8. 08 echo 'v2' > app.txt
  9. 09 git add app.txt
  10. 10 git commit -m 'v2'
  11. 11 git tag v10.0
  12. 12 echo 'v3' > app.txt
  13. 13 git add app.txt
  14. 14 git commit -m 'v3'
  15. 15 git tag v2.0
  16. 16 echo 'v4' > app.txt
  17. 17 git add app.txt
  18. 18 git commit -m 'v4'
  19. 19 git tag v1.10
Prompt
List all tags sorted by version number in ascending order. Output ONLY the git command, nothing else.
Expected
git tag -l --sort=version:refname
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 80 in → 713 out (702 reasoning)
git tag --sort=version:refname
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 79 in → 228 out (209 reasoning)
git tag --sort=version:refname
JSON Schema Structured Output
(raw) { "command": "git tag --sort=version:refname" }
deepseek/deepseek-v4-flash:none PASS 100% 77 in → 10 out (0 reasoning)
git tag --sort=version:refname
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 83 in → 18 out (0 reasoning)
git tag --sort=version:refname
JSON Schema Structured Output
(raw) { "command": "git tag --sort=version:refname" }
mistralai/devstral-2512 PASS 100% 86 in → 15 out
```bash git tag --sort=version:refname ```
mistralai/devstral-2512__json_schema PASS 100% 85 in → 14 out
git tag -l | sort -V
JSON Schema Structured Output
(raw) {"command": "git tag -l | sort -V"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 99 in → 729 out (670 reasoning)
git tag --list --sort=v:refname
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 98 in → 807 out (802 reasoning)
git tag --sort=v
JSON Schema Structured Output
(raw) { "command": "git tag --sort=v" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 100 in → 35 out (0 reasoning)
git tag --list --sort=-v:refname | sort -t. -k1,1n -k2,2n -k3,3n
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 99 in → 47 out (0 reasoning)
git tag --list --sort=-v:refname | sed 's/.*\(v\?[0-9.]*\).*/\1/' | sort -V
JSON Schema Structured Output
(raw) { "command": "git tag --list --sort=-v:refname | sed 's/.*\\(v\\?[0-9.]*\\).*/\\1/' | sort -V" }
poolside/laguna-xs.2:high PASS 100% 136 in → 397 out (385 reasoning)
git tag --sort=version:refname
poolside/laguna-xs.2:high__json_schema PASS 100% 135 in → 297 out (284 reasoning)
git tag | sort -V
JSON Schema Structured Output
(raw) {"command": "git tag | sort -V"}
poolside/laguna-xs.2:none PASS 100% 138 in → 10 out (0 reasoning)
git tag -l | sort -V
poolside/laguna-xs.2:none__json_schema PASS 100% 139 in → 16 out (0 reasoning)
git tag sort -V
JSON Schema Structured Output
(raw) { "command": "git tag sort -V" }