: GitBench
Fetch tags from a bare remote
Tests ability to fetch tags from a bare remote repository. Evaluates tag-fetch workflow.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'App' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'Initial release'
  7. 07 git tag v1.0
  8. 08 echo 'New' > new.txt
  9. 09 git add new.txt
  10. 10 git commit -m 'Second release'
  11. 11 git tag v2.0
  12. 12 mkdir -p ../remote-bare && cd ../remote-bare && git init --bare
  13. 13 git remote add origin ../remote-bare
  14. 14 git push origin main --tags
  15. 15 git tag -d v1.0
  16. 16 git tag -d v2.0
Prompt
Fetch all tags from the remote 'origin'. Output ONLY the git command, nothing else.
Expected
git fetch --tags origin
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 43 in → 53 out (46 reasoning)
git fetch --tags origin
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 44 in → 57 out (43 reasoning)
git fetch origin --tags
JSON Schema Structured Output
(raw) { "command": "git fetch origin --tags" }
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 44 in → 13 out (0 reasoning)
git fetch origin --tags
JSON Schema Structured Output
(raw) { "command": "git fetch origin --tags" }
mistralai/devstral-2512__json_schema PASS 100% 48 in → 11 out
git fetch origin --tags
JSON Schema Structured Output
(raw) {"command": "git fetch origin --tags"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 60 in → 93 out (105 reasoning)
git fetch --tags origin
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 60 in → 137 out (140 reasoning)
git fetch --tags origin
JSON Schema Structured Output
(raw) {"command":"git fetch --tags origin"}
poolside/laguna-xs.2:high PASS 100% 98 in → 94 out (86 reasoning)
git fetch origin --tags
poolside/laguna-xs.2:high__json_schema PASS 100% 98 in → 109 out (92 reasoning)
git fetch origin --tags
JSON Schema Structured Output
(raw) { "command": "git fetch origin --tags" }
nvidia/nemotron-3-nano-30b-a3b:none FAIL 50% 60 in → 31 out (0 reasoning)
git tag -a v1.0 -m "Initial release" git tag -a v1.0.1 -m "Second release"
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v2.0'}]
(empty output)
Failure: Model 'deepseek/deepseek-v4-flash:none' violated the reasoning_level=none invariant: provider reported 63 reasoning token(s)
mistralai/devstral-2512 FAIL 0% 45 in → 11 out
```bash git fetch origin --tags ```
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v1.0'}, {'type': 'git_output', 'command': 'git tag -l', 'contains': 'v2.0'}]
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 0% 60 in → 11 out (0 reasoning)
git tag
JSON Schema Structured Output
(raw) { "command": "git tag" }
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v1.0'}, {'type': 'git_output', 'command': 'git tag -l', 'contains': 'v2.0'}]
poolside/laguna-xs.2:none FAIL 0% 99 in → 8 out (0 reasoning)
git ls-remote --tags origin
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v1.0'}, {'type': 'git_output', 'command': 'git tag -l', 'contains': 'v2.0'}]
poolside/laguna-xs.2:none__json_schema FAIL 0% 98 in → 12 out (0 reasoning)
git ls-remote --tags origin
JSON Schema Structured Output
(raw) {"command": "git ls-remote --tags origin"}
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v1.0'}, {'type': 'git_output', 'command': 'git tag -l', 'contains': 'v2.0'}]