: GitBench
Make changes in worktree and commit
Tests ability to make changes in a worktree and commit them. Evaluates worktree-as-workspace workflow.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Hello' > hello.txt
  5. 05 git add hello.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 git checkout -b feature
  8. 08 echo 'Feature' > feature.txt
  9. 09 git add feature.txt
  10. 10 git commit -m 'Add feature'
  11. 11 git checkout main
Prompt
Create a git worktree for the 'feature' branch at '../feature-wt'. Then write 'Updated' to the file '../feature-wt/feature.txt'. Output ONLY the commands to run, one per line.
Expected
git worktree add ../feature-wt feature echo 'Updated' > ../feature-wt/feature.txt
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 136 in → 280 out (273 reasoning)
git worktree add ../feature-wt feature echo 'Updated' > ../feature-wt/feature.txt
deepseek/deepseek-v4-flash:none PASS 100% 133 in → 22 out (0 reasoning)
git worktree add ../feature-wt feature echo 'Updated' > ../feature-wt/feature.txt
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 137 in → 36 out (0 reasoning)
git worktree add ../feature-wt feature echo 'Updated' > ../feature-wt/feature.txt
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add ../feature-wt feature", "echo 'Updated' > ../feature-wt/feature.txt" ] }
mistralai/devstral-2512__json_schema PASS 100% 154 in → 37 out
git worktree add ../feature-wt feature echo 'Updated' > ../feature-wt/feature.txt
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add ../feature-wt feature", "echo 'Updated' > ../feature-wt/feature.txt" ] }
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 167 in → 590 out (560 reasoning)
git worktree add ../feature-wt feature echo Updated > ../feature-wt/feature.txt
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 164 in → 313 out (299 reasoning)
git worktree add ../feature-wt feature echo Updated > ../feature-wt/feature.txt
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add ../feature-wt feature", "echo Updated > ../feature-wt/feature.txt" ] }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 165 in → 23 out (0 reasoning)
git worktree add ../feature-wt feature echo "Updated" > ../feature-wt/feature.txt
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 168 in → 35 out (0 reasoning)
git worktree add ../feature-wt feature echo Updated > ../feature-wt/feature.txt
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add ../feature-wt feature", "echo Updated > ../feature-wt/feature.txt" ] }
poolside/laguna-xs.2:high PASS 100% 190 in → 478 out (432 reasoning)
git worktree add ../feature-wt feature echo "Updated" > ../feature-wt/feature.txt </think> git worktree add ../feature-wt feature echo "Updated" > ../feature-wt/feature.txt
poolside/laguna-xs.2:high__json_schema PASS 100% 191 in → 201 out (172 reasoning)
git worktree add ../feature-wt feature echo 'Updated' > ../feature-wt/feature.txt
JSON Schema Structured Output
(raw) {"commands": ["git worktree add ../feature-wt feature", "echo 'Updated' > ../feature-wt/feature.txt"]}
poolside/laguna-xs.2:none PASS 100% 193 in → 23 out (0 reasoning)
git worktree add ../feature-wt feature echo "Updated" > ../feature-wt/feature.txt
poolside/laguna-xs.2:none__json_schema PASS 100% 192 in → 38 out (0 reasoning)
git worktree add ../feature-wt feature echo 'Updated' > ../feature-wt/feature.txt
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add ../feature-wt feature", "echo 'Updated' > ../feature-wt/feature.txt" ]}
Invalid JSON. Output: git worktree add ../feature-wt feature echo "Updated" > ../feature-wt/feature.txt
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
mistralai/devstral-2512 FAIL 0% 149 in → 28 out
```bash git worktree add ../feature-wt feature echo "Updated" > ../feature-wt/feature.txt ```
Failure: Failed assertions: [{'type': 'file_content', 'path': '../feature-wt/feature.txt', 'value': 'Updated'}]