: GitBench
Rebase/Squash style - intermediate commits
Tests ability to identify intermediate/rebase-style commits that should be squashed. Evaluates recognizing incomplete or intermediate work.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 touch initial.txt
  5. 05 git add initial.txt
  6. 06 git commit -m 'Add initial file'
  7. 07 echo 'line1' > data.txt
  8. 08 git add data.txt
  9. 09 git commit -m 'WIP: adding data'
  10. 10 echo 'line1 line2' > data.txt
  11. 11 git add data.txt
  12. 12 git commit -m 'WIP: continue data'
  13. 13 echo 'line1 line2 line3' > data.txt
  14. 14 git add data.txt
  15. 15 git commit -m 'Add complete data'
Prompt
This repository has a series of commits that incrementally build up a feature. Identify the WIP commits that should be marked as squash/fixup into a single clean commit. Do not include the final completed-feature commit. Output ONLY the selected commits, one per line, with no explanation.
Expected
WIP: adding data, WIP: continue data
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 107 in → 226 out (228 reasoning)
0b70541 ef9de4a
deepseek/deepseek-v4-flash:none PASS 100% 108 in → 28 out (0 reasoning)
b453c18 WIP: continue data b88f3e1 WIP: adding data ad632ac Add initial file
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 106 in → 32 out (0 reasoning)
80686d1 WIP: continue data 10bc349 WIP: adding data
JSON Schema Structured Output
(raw) { "commits": [ "80686d1 WIP: continue data", "10bc349 WIP: adding data" ] }
mistralai/devstral-2512 PASS 100% 118 in → 14 out
d4bcba5 0d31c9e
mistralai/devstral-2512__json_schema PASS 100% 119 in → 23 out
c4bb6c5 25f12e9
JSON Schema Structured Output
(raw) {"commits": ["c4bb6c5", "25f12e9"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 132 in → 563 out (457 reasoning)
d708b68 26db264
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 129 in → 552 out (489 reasoning)
8f860bd 6a7fa9c
JSON Schema Structured Output
(raw) { "commits": [ "8f860bd", "6a7fa9c" ] }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 131 in → 22 out (0 reasoning)
792e6c2 f56f5b7 8af2cec
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 132 in → 30 out (0 reasoning)
4982fb9 4185022
JSON Schema Structured Output
(raw) { "commits": [ "4982fb9", "4185022" ] }
poolside/laguna-xs.2:high PASS 100% 160 in → 222 out (203 reasoning)
30f5646 53c606c
poolside/laguna-xs.2:high__json_schema PASS 100% 161 in → 264 out (236 reasoning)
7780405 b12947a
JSON Schema Structured Output
(raw) { "commits": ["7780405", "b12947a"] }
poolside/laguna-xs.2:none PASS 100% 158 in → 28 out (0 reasoning)
``` cb86f7b WIP: continue data 4480f31 WIP: adding data ```
poolside/laguna-xs.2:none__json_schema PASS 100% 157 in → 21 out (0 reasoning)
c36c012 6d3cf89
JSON Schema Structured Output
(raw) {"commits": ["c36c012", "6d3cf89"]}
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)