: GitBench
Rename a tag via delete and create
Tests ability to rename a tag via delete-and-create workflow. Evaluates tag-rename as a composite operation.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'App' > app.txt
  5. 05 git add app.txt
  6. 06 git commit -m 'Release v1'
  7. 07 git tag v1.0-rc
Prompt
Rename the tag 'v1.0-rc' to 'v1.0' by moving the tag to the same commit. Output ONLY the git commands, one per line, nothing else.
Expected
git tag v1.0 v1.0-rc && git tag -d v1.0-rc
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 65 in → 629 out (498 reasoning)
git tag v1.0 v1.0-rc git tag -d v1.0-rc
deepseek/deepseek-v4-flash:none PASS 100% 65 in → 26 out (0 reasoning)
git tag -f v1.0 8c74fa3 git tag -d v1.0-rc
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 67 in → 32 out (0 reasoning)
git tag -d v1.0-rc && git tag v1.0 eac4df7
JSON Schema Structured Output
(raw) { "command": "git tag -d v1.0-rc && git tag v1.0 eac4df7" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 79 in → 48 out (0 reasoning)
git tag -d v1.0-rc git tag v1.0 90b0c3e git push --force origin v1.0 git push --delete origin v1.0-rc
poolside/laguna-xs.2:high PASS 100% 112 in → 73 out (48 reasoning)
git tag -d v1.0-rc git tag v1.0 be02602
poolside/laguna-xs.2:high__json_schema PASS 100% 111 in → 344 out (295 reasoning)
git tag -d v1.0-rc && git tag v1.0 79ecf30 && git push origin :refs/tags/v1.0-rc && git push origin v1.0
JSON Schema Structured Output
(raw) {"command": "git tag -d v1.0-rc && git tag v1.0 79ecf30 && git push origin :refs/tags/v1.0-rc && git push origin v1.0"}
poolside/laguna-xs.2:none PASS 100% 113 in → 26 out (0 reasoning)
git tag -d v1.0-rc git tag v1.0 7b02195
poolside/laguna-xs.2:none__json_schema PASS 100% 112 in → 29 out (0 reasoning)
git tag -d v1.0-rc git tag v1.0 72ea6a6
JSON Schema Structured Output
(raw) {"command": "git tag -d v1.0-rc\ngit tag v1.0 72ea6a6"}
deepseek/deepseek-v4-flash:high__json_schema FAIL 66.7% 66 in → 517 out (493 reasoning)
git tag -f v1.0 v1.0-rc
JSON Schema Structured Output
(raw) { "command": "git tag -f v1.0 v1.0-rc" }
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v1.0-rc'}]
mistralai/devstral-2512__json_schema FAIL 66.7% 66 in → 19 out
git tag v1.0 d20c347
JSON Schema Structured Output
(raw) {"command": "git tag v1.0 d20c347"}
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v1.0-rc'}]
nvidia/nemotron-3-nano-30b-a3b:high__json_schema FAIL 66.7% 76 in → 1,277 out (1,098 reasoning)
git tag v1.0 v1.0-rc
JSON Schema Structured Output
(raw) { "command": "git tag v1.0 v1.0-rc" }
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v1.0-rc'}]
mistralai/devstral-2512 FAIL 33.3% 66 in → 31 out
```git git tag v1.0 96977a7 git tag -d v1.0-rc ```
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v1.0-rc'}, {'type': 'git_output', 'command': 'git log v1.0 --oneline -1', 'contains': 'Release v1'}]
nvidia/nemotron-3-nano-30b-a3b:high FAIL 33.3% 79 in → 1,287 out (1,049 reasoning)
git push origin --delete v1.0-rc git tag -f v1.0 v1.0-rc git push origin v1.0
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v1.0-rc'}, {'type': 'git_output', 'command': 'git log v1.0 --oneline -1', 'contains': 'Release v1'}]
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 33.3% 78 in → 19 out (0 reasoning)
git branch -d v1.0-rc
JSON Schema Structured Output
(raw) { "command": "git branch -d v1.0-rc" }
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'not_contains': 'v1.0-rc'}, {'type': 'git_output', 'command': 'git log v1.0 --oneline -1', 'contains': 'Release v1'}]