: GitBench
Create a signed tag (simulated with annotated tag)
Tests ability to create a signed tag (simulated with annotated tag). Evaluates signed-tag creation awareness.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 git config tag.gpgSign false
  5. 05 echo 'App' > app.txt
  6. 06 git add app.txt
  7. 07 git commit -m 'Secure release'
Prompt
Create an annotated git tag named 'v1.0-signed' on HEAD with the message 'Signed release'. Since no GPG key is configured, create it as a regular annotated tag. Output ONLY the git command, nothing else.
Expected
git tag -a v1.0-signed -m 'Signed release'
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 67 in → 104 out (85 reasoning)
git tag -a v1.0-signed -m "Signed release" HEAD
deepseek/deepseek-v4-flash:none PASS 100% 83 in → 17 out (0 reasoning)
git tag -a v1.0-signed -m 'Signed release' HEAD
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 65 in → 25 out (0 reasoning)
git tag -a v1.0-signed -m 'Signed release'
JSON Schema Structured Output
(raw) { "command": "git tag -a v1.0-signed -m 'Signed release'" }
mistralai/devstral-2512__json_schema PASS 100% 66 in → 23 out
git tag -a v1.0-signed -m "Signed release" HEAD
JSON Schema Structured Output
(raw) {"command": "git tag -a v1.0-signed -m \"Signed release\" HEAD"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 81 in → 400 out (403 reasoning)
git tag -a v1.0-signed -m "Signed release" HEAD
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 81 in → 123 out (102 reasoning)
git tag -a v1.0-signed -m "Signed release" HEAD
JSON Schema Structured Output
(raw) { "command": "git tag -a v1.0-signed -m \"Signed release\" HEAD" }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 80 in → 24 out (0 reasoning)
git tag -a v1.0-signed -m 'Signed release'
JSON Schema Structured Output
(raw) { "command": "git tag -a v1.0-signed -m 'Signed release'" }
poolside/laguna-xs.2:high PASS 100% 116 in → 215 out (199 reasoning)
git tag -a v1.0-signed -m 'Signed release'
poolside/laguna-xs.2:high__json_schema PASS 100% 117 in → 170 out (150 reasoning)
git tag -a v1.0-signed -m 'Signed release'
JSON Schema Structured Output
(raw) {"command": "git tag -a v1.0-signed -m 'Signed release'"}
poolside/laguna-xs.2:none__json_schema PASS 100% 116 in → 22 out (0 reasoning)
git tag -a v1.0-signed -m 'Signed release' HEAD
JSON Schema Structured Output
(raw) {"command": "git tag -a v1.0-signed -m 'Signed release' HEAD"}
Invalid JSON. Output: git tag -a v1.0-signed -m "Signed release" HEAD
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
mistralai/devstral-2512 FAIL 0% 68 in → 23 out
```bash git tag -a v1.0-signed -m "Signed release" HEAD ```
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v1.0-signed'}, {'type': 'git_output', 'command': 'git cat-file -t v1.0-signed', 'contains': 'tag'}, {'type': 'git_output', 'command': 'git tag -n1 v1.0-signed', 'contains': 'Signed release'}]
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 81 in → 29 out (0 reasoning)
git tag -a v1.0-signed -m 'Signed release' LOG: 0a1d26f Secure release
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v1.0-signed'}, {'type': 'git_output', 'command': 'git cat-file -t v1.0-signed', 'contains': 'tag'}, {'type': 'git_output', 'command': 'git tag -n1 v1.0-signed', 'contains': 'Signed release'}]
poolside/laguna-xs.2:none FAIL 0% 115 in → 23 out (0 reasoning)
```bash git tag -a v1.0-signed -m 'Signed release' HEAD ```
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git tag -l', 'contains': 'v1.0-signed'}, {'type': 'git_output', 'command': 'git cat-file -t v1.0-signed', 'contains': 'tag'}, {'type': 'git_output', 'command': 'git tag -n1 v1.0-signed', 'contains': 'Signed release'}]