: GitBench
Create worktree and make commit
Tests ability to create a worktree and immediately make a commit in it. Evaluates end-to-end worktree commit workflow.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'Hello' > hello.txt
  5. 05 git add hello.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 git checkout -b feature
  8. 08 echo 'Feature' > feature.txt
  9. 09 git add feature.txt
  10. 10 git commit -m 'Add feature'
  11. 11 git checkout main
Prompt
Create a worktree for 'feature' at '../feature-wt', then create a file '../feature-wt/new.txt' with content 'new file', add it and commit with message 'Add new file' in the worktree. Output ONLY the commands to run, one per line.
Expected
git worktree add ../feature-wt feature echo 'new file' > ../feature-wt/new.txt git -C ../feature-wt add new.txt git -C ../feature-wt commit -m 'Add new file'
Loading campaign evidence…
deepseek/deepseek-v4-flash:none PASS 100% 148 in → 49 out (0 reasoning)
git worktree add ../feature-wt feature echo 'new file' > ../feature-wt/new.txt git -C ../feature-wt add new.txt git -C ../feature-wt commit -m 'Add new file'
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 145 in → 68 out (0 reasoning)
git worktree add ../feature-wt feature echo 'new file' > ../feature-wt/new.txt git -C ../feature-wt add new.txt git -C ../feature-wt commit -m 'Add new file'
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add ../feature-wt feature", "echo 'new file' > ../feature-wt/new.txt", "git -C ../feature-wt add new.txt", "git -C ../feature-wt commit -m 'Add new file'" ] }
mistralai/devstral-2512__json_schema PASS 100% 166 in → 67 out
git worktree add ../feature-wt feature echo 'new file' > ../feature-wt/new.txt git -C ../feature-wt add new.txt git -C ../feature-wt commit -m 'Add new file'
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add ../feature-wt feature", "echo 'new file' > ../feature-wt/new.txt", "git -C ../feature-wt add new.txt", "git -C ../feature-wt commit -m 'Add new file'" ] }
poolside/laguna-xs.2:high__json_schema PASS 100% 202 in → 971 out (913 reasoning)
git worktree add ../feature-wt feature echo 'new file' > ../feature-wt/new.txt git -C ../feature-wt add new.txt git -C ../feature-wt commit -m 'Add new file'
JSON Schema Structured Output
(raw) {"commands": ["git worktree add ../feature-wt feature", "echo 'new file' > ../feature-wt/new.txt", "git -C ../feature-wt add new.txt", "git -C ../feature-wt commit -m 'Add new file'"]}
nvidia/nemotron-3-nano-30b-a3b:high FAIL 50% 179 in → 622 out (596 reasoning)
git worktree add ../feature-wt feature echo new file > ../feature-wt/new.txt git add new.txt git commit -m "Add new file"
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]
nvidia/nemotron-3-nano-30b-a3b:high__json_schema FAIL 50% 184 in → 4,049 out (4,357 reasoning)
git worktree add ../feature-wt feature echo 'new file' > ../feature-wt/new.txt git add ../feature-wt/new.txt git commit -m 'Add new file'
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add ../feature-wt feature", "echo 'new file' > ../feature-wt/new.txt", "git add ../feature-wt/new.txt", "git commit -m 'Add new file'" ] }
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]
poolside/laguna-xs.2:none FAIL 50% 209 in → 45 out (0 reasoning)
git worktree add ../feature-wt feature echo "new file" > ../feature-wt/new.txt cd ../feature-wt git add new.txt git commit -m "Add new file"
Failure: Failed assertions: [{'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]
deepseek/deepseek-v4-flash:high FAIL 0% 170 in → 1,113 out (1,070 reasoning)
git worktree add -b feature ../feature-wt cd ../feature-wt echo 'new file' > new.txt git add new.txt git commit -m "Add new file"
Failure: Failed assertions: [{'type': 'file_content', 'path': '../feature-wt/new.txt', 'value': 'new file'}, {'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
mistralai/devstral-2512 FAIL 0% 164 in → 54 out
```bash git worktree add ../feature-wt feature echo "new file" > ../feature-wt/new.txt git -C ../feature-wt add new.txt git -C ../feature-wt commit -m "Add new file" ```
Failure: Failed assertions: [{'type': 'file_content', 'path': '../feature-wt/new.txt', 'value': 'new file'}, {'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 176 in → 56 out (0 reasoning)
git worktree add ../feature-wt <commit> git -C ../feature-wt write-new-file64 new.txt "new file" git -C ../feature-wt add new.txt git -C ../feature-wt commit -m "Add new file"
Failure: Failed assertions: [{'type': 'file_content', 'path': '../feature-wt/new.txt', 'value': 'new file'}, {'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]
nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 0% 180 in → 74 out (0 reasoning)
git worktree add '../feature-wt' HEAD^ echo 'new file' > '../feature-wt/new.txt' git -C '../feature-wt' add 'new.txt' git -C '../feature-wt' commit -m 'Add new file'
JSON Schema Structured Output
(raw) { "commands": [ "git worktree add '../feature-wt' HEAD^", "echo 'new file' > '../feature-wt/new.txt'", "git -C '../feature-wt' add 'new.txt'", "git -C '../feature-wt' commit -m 'Add new file'" ] }
Failure: Failed assertions: [{'type': 'file_content', 'path': '../feature-wt/new.txt', 'value': 'new file'}, {'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]
poolside/laguna-xs.2:high FAIL 0% 207 in → 694 out (638 reasoning)
``` git worktree add -b feature ../feature-wt echo 'new file' > ../feature-wt/new.txt git -C ../feature-wt add new.txt git -C ../feature-wt commit -m 'Add new file' ```
Failure: Failed assertions: [{'type': 'file_content', 'path': '../feature-wt/new.txt', 'value': 'new file'}, {'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]
poolside/laguna-xs.2:none__json_schema FAIL 0% 208 in → 50 out (0 reasoning)
git worktree add '../feature-wt' feature cd '../feature-wt' echo 'new file' > new.txt git add new.txt git commit -m 'Add new file'
JSON Schema Structured Output
(raw) {"commands": ["git worktree add '../feature-wt' feature", "cd '../feature-wt'", "echo 'new file' > new.txt", "git add new.txt", "git commit -m 'Add new file'"]}
Failure: Failed assertions: [{'type': 'file_content', 'path': '../feature-wt/new.txt', 'value': 'new file'}, {'type': 'git_output', 'command': 'git -C ../feature-wt log --oneline -1', 'contains': 'Add new file'}]