: GitBench
Create worktree from tag
Tests ability to create a worktree from a specific tag. Evaluates tag-based worktree creation.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'v1.0' > version.txt
  5. 05 git add version.txt
  6. 06 git commit -m 'Version 1.0'
  7. 07 git tag v1.0
  8. 08 echo 'v2.0' > version.txt
  9. 09 git add version.txt
  10. 10 git commit -m 'Version 2.0'
  11. 11 git tag v2.0
Prompt
Create a worktree at '../v1-checkout' pointing at the tag 'v1.0'. Output ONLY the git worktree command, nothing else.
Expected
git worktree add ../v1-checkout v1.0
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 122 in → 132 out (117 reasoning)
git worktree add ../v1-checkout v1.0
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 119 in → 151 out (129 reasoning)
git worktree add ../v1-checkout v1.0
JSON Schema Structured Output
(raw) { "command": "git worktree add ../v1-checkout v1.0" }
deepseek/deepseek-v4-flash:none PASS 100% 121 in → 14 out (0 reasoning)
git worktree add ../v1-checkout v1.0
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 122 in → 22 out (0 reasoning)
git worktree add ../v1-checkout v1.0
JSON Schema Structured Output
(raw) { "command": "git worktree add ../v1-checkout v1.0" }
mistralai/devstral-2512__json_schema PASS 100% 142 in → 19 out
git worktree add ../v1-checkout v1.0
JSON Schema Structured Output
(raw) {"command": "git worktree add ../v1-checkout v1.0"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 154 in → 157 out (132 reasoning)
git worktree add ../v1-checkout v1.0
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 156 in → 115 out (91 reasoning)
git worktree add ../v1-checkout v1.0
JSON Schema Structured Output
(raw) { "command": "git worktree add ../v1-checkout v1.0" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 154 in → 19 out (0 reasoning)
git worktree add '../v1-checkout' 'refs/tags/v1.0'
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 155 in → 22 out (0 reasoning)
git worktree add ../v1-checkout v1.0
JSON Schema Structured Output
(raw) { "command": "git worktree add ../v1-checkout v1.0" }
poolside/laguna-xs.2:high PASS 100% 181 in → 118 out (104 reasoning)
git worktree add ../v1-checkout v1.0
poolside/laguna-xs.2:high__json_schema PASS 100% 179 in → 144 out (125 reasoning)
git worktree add ../v1-checkout v1.0
JSON Schema Structured Output
(raw) {"command": "git worktree add ../v1-checkout v1.0"}
poolside/laguna-xs.2:none__json_schema PASS 100% 182 in → 23 out (0 reasoning)
git worktree add ../v1-checkout v1.0
JSON Schema Structured Output
(raw) { "command": "git worktree add ../v1-checkout v1.0" }
mistralai/devstral-2512 FAIL 0% 142 in → 19 out
```sh git worktree add ../v1-checkout v1.0 ```
Failure: Failed assertions: [{'type': 'dir_exists', 'path': '../v1-checkout'}, {'type': 'file_content', 'path': '../v1-checkout/version.txt', 'value': 'v1.0'}]
poolside/laguna-xs.2:none FAIL 0% 179 in → 19 out (0 reasoning)
```bash git worktree add ../v1-checkout v1.0 ```
Failure: Failed assertions: [{'type': 'dir_exists', 'path': '../v1-checkout'}, {'type': 'file_content', 'path': '../v1-checkout/version.txt', 'value': 'v1.0'}]