: GitBench
Push a tag to a bare remote
Tests ability to push a tag to a bare remote repository. Evaluates tag-push workflow with remotes.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'App' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'Initial release'
  7. 07 git tag v1.0
  8. 08 mkdir -p ../remote-bare && cd ../remote-bare && git init --bare
  9. 09 git remote add origin ../remote-bare
  10. 10 git push origin main
Prompt
Push the tag 'v1.0' to the remote 'origin'. Output ONLY the git command, nothing else.
Expected
git push origin v1.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 45 in → 43 out (34 reasoning)
git push origin v1.0
deepseek/deepseek-v4-flash:none PASS 100% 45 in → 8 out (0 reasoning)
git push origin v1.0
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 45 in → 13 out (0 reasoning)
git push origin v1.0
JSON Schema Structured Output
(raw) {"command": "git push origin v1.0"}
mistralai/devstral-2512__json_schema PASS 100% 46 in → 13 out
git push origin v1.0
JSON Schema Structured Output
(raw) {"command": "git push origin v1.0"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 58 in → 60 out (51 reasoning)
git push origin v1.0
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 59 in → 86 out (62 reasoning)
git push origin v1.0
JSON Schema Structured Output
(raw) { "command": "git push origin v1.0" }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 59 in → 16 out (0 reasoning)
git push origin v1.0
JSON Schema Structured Output
(raw) { "command": "git push origin v1.0" }
poolside/laguna-xs.2:high PASS 100% 96 in → 60 out (50 reasoning)
git push origin v1.0
poolside/laguna-xs.2:high__json_schema PASS 100% 95 in → 115 out (102 reasoning)
git push origin v1.0
JSON Schema Structured Output
(raw) {"command": "git push origin v1.0"}
poolside/laguna-xs.2:none PASS 100% 94 in → 9 out (0 reasoning)
git push origin v1.0
poolside/laguna-xs.2:none__json_schema PASS 100% 96 in → 13 out (0 reasoning)
git push origin v1.0
JSON Schema Structured Output
(raw) {"command": "git push origin v1.0"}
Invalid JSON. Output: gitpush origin v1.0
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
mistralai/devstral-2512 FAIL 0% 46 in → 13 out
```bash git push origin v1.0 ```
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git ls-remote --tags origin', 'contains': 'v1.0'}]
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 59 in → 15 out (0 reasoning)
git tag v1.0 && git push origin v1.0
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git ls-remote --tags origin', 'contains': 'v1.0'}]