: GitBench
Create a lightweight tag on HEAD
Tests ability to create a lightweight tag on HEAD. Evaluates basic tag-creation command knowledge.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Hello' > hello.txt
  5. 05 git add hello.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 echo 'World' > world.txt
  8. 08 git add world.txt
  9. 09 git commit -m 'Add world'
Prompt
Create a lightweight git tag named 'v1.0' pointing at the current HEAD. Output ONLY the git command, nothing else.
Expected
git tag v1.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 54 in → 45 out (37 reasoning)
git tag v1.0
deepseek/deepseek-v4-flash:none PASS 100% 55 in → 7 out (0 reasoning)
git tag v1.0
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 54 in → 12 out (0 reasoning)
git tag v1.0
JSON Schema Structured Output
(raw) {"command": "git tag v1.0"}
mistralai/devstral-2512__json_schema PASS 100% 57 in → 12 out
git tag v1.0
JSON Schema Structured Output
(raw) {"command": "git tag v1.0"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 70 in → 177 out (157 reasoning)
git tag v1.0
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 69 in → 88 out (71 reasoning)
git tag v1.0
JSON Schema Structured Output
(raw) { "command": "git tag v1.0" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 66 in → 9 out (0 reasoning)
git tag -f v1.0
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 66 in → 15 out (0 reasoning)
git tag v1.0
JSON Schema Structured Output
(raw) { "command": "git tag v1.0" }
poolside/laguna-xs.2:high PASS 100% 106 in → 121 out (113 reasoning)
git tag v1.0
poolside/laguna-xs.2:high__json_schema PASS 100% 105 in → 146 out (134 reasoning)
git tag v1.0
JSON Schema Structured Output
(raw) {"command": "git tag v1.0"}
poolside/laguna-xs.2:none PASS 100% 105 in → 18 out (0 reasoning)
git tag -a v1.0 -m "v1.0"
poolside/laguna-xs.2:none__json_schema PASS 100% 108 in → 23 out (0 reasoning)
git tag -a v1.0 -m "v1.0" HEAD
JSON Schema Structured Output
(raw) {"command": "git tag -a v1.0 -m \"v1.0\" HEAD"}
Invalid JSON. Output: git tag v1.0
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
mistralai/devstral-2512 FAIL 0% 55 in → 12 out
```bash git tag v1.0 ```
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v1.0'}, {'type': 'git_output', 'command': 'git log v1.0 --oneline -1', 'contains': 'Add world'}]