: GitBench
Multiple separate feature development commits
Tests ability to squash multiple separate feature-development commits into logical groups. Evaluates feature-level commit organization.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 touch base.txt
  5. 05 git add base.txt
  6. 06 git commit -m 'Base commit'
  7. 07 echo 'feat1a' > feature1.txt
  8. 08 git add feature1.txt
  9. 09 git commit -m 'WIP: start feature 1'
  10. 10 echo 'feat1b' >> feature1.txt
  11. 11 git add feature1.txt
  12. 12 git commit -m 'WIP: continue feature 1'
  13. 13 echo 'feat2a' > feature2.txt
  14. 14 git add feature2.txt
  15. 15 git commit -m 'Start feature 2'
Prompt
The second feature was done cleanly in one commit, but feature 1 has WIP commits that should be marked as squash/fixup. Identify only the WIP commits for feature 1 that should be combined. Do not include the clean feature 2 commit. Provide the commit messages.
Expected
WIP: start feature 1, WIP: continue feature 1
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 115 in → 180 out (159 reasoning)
"WIP: start feature 1" and "WIP: continue feature 1"
deepseek/deepseek-v4-flash:none PASS 100% 110 in → 56 out (0 reasoning)
Based on the request, the WIP commits for feature 1 that should be combined (squashed/fixup) are: - `77b1d41 WIP: continue feature 1` - `c524de9 WIP: start feature 1`
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 117 in → 34 out (0 reasoning)
5cb2e0c WIP: continue feature 1 bd75c08 WIP: start feature 1
JSON Schema Structured Output
(raw) { "commits": ["5cb2e0c WIP: continue feature 1", "bd75c08 WIP: start feature 1"] }
mistralai/devstral-2512 FAIL 100% 123 in → 144 out
Based on the git commit history provided, the WIP commits for feature 1 that should be combined (squashed/fixup) are: 1. `99d2a1a WIP: continue feature 1` 2. `d473830 WIP: start feature 1` These are the only commits related to feature 1 that contain "WIP" in their messages and should be squashed together. The commit `93544bb Start feature 2` is for feature 2 and should not be included, as requested. The base commit `b19e8f4` is also not part of feature 1's development.
Failure: Extra selected commit messages: ['Start feature 2', 'Base commit']
mistralai/devstral-2512__json_schema PASS 100% 124 in → 38 out
f84d62e WIP: continue feature 1 580c00b WIP: start feature 1
JSON Schema Structured Output
(raw) {"commits": ["f84d62e WIP: continue feature 1", "580c00b WIP: start feature 1"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 132 in → 229 out (189 reasoning)
- WIP: continue feature 1 - WIP: start feature 1
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 134 in → 284 out (245 reasoning)
WIP: continue feature 1 WIP: start feature 1
JSON Schema Structured Output
(raw) { "commits": [ "WIP: continue feature 1", "WIP: start feature 1" ] }
nvidia/nemotron-3-nano-30b-a3b:none FAIL 100% 134 in → 655 out (0 reasoning)
Based on your requirements to **only identify WIP commits for feature 1** (ignoring feature 2) and **provide their commit messages**, here's the precise analysis: ### Key Observations: - The **clean feature 2 commit** (`b1cca79 Start feature 2`) is to be **excluded entirely**. - **Feature 1 WIP commits** are commits marked with "WIP" *specifically related to feature 1* (not for feature 2). - **Not all "WIP" commits involve feature 1** (e.g., a "WIP" commit for feature 2 would be irrelevant here). ### Commit Breakdown: | SHA | Message | Reason to Include? | |-------------|----------------------------------|-------------------| | `b1cca79` | `Start feature 2` | ❌ **Feature 2** → Excluded per instructions. | | `8ed5d19` | `WIP: continue feature 1` | ✅ **Feature 1** → Valid WIP commit to squash. | | `f581008` | `WIP: start feature 1` | ✅ **Feature 1** → Valid WIP commit to squash. | | `e561418` | `Base commit` | ❌ Not a WIP → Base commit (keep as-is). | ### Identified WIP Commits for Feature 1 (Only): These are the **two commits** that need to be combined into a single commit. Their messages are exactly as written in the history. > **Commit messages to squash/fixup**: > `WIP: continue feature 1` > `WIP: start feature 1` ### Why This Is Correct: 1. **Feature 1 WIP commits are isolated** to `f581008` and `8ed5d19` (both explicitly say "feature 1"). 2. **Feature 2 is clean and excluded** (`b1cca79` is not a WIP commit and relates to feature 2). 3. **No other commits** involve feature 1 or are WIPs (e.g., `e561418` is a base commit, not WIP). ### Action Required: When rewriting history, **only these two commit messages** should be considered for squashing/falshing. The final commit tree will merge them into one clean feature 1 commit (e.g., "Build feature 1 functionality"). > 💡 **Note**: The order in the history (newest first) means `8ed5d19` (continue feature 1) came *after* `f581008` (start feature 1). When squashing, the **earlier commit** (`f581008`) will typically be the base, and `8ed5d19` will be applied on top. But **only the messages** are requested here.
Failure: Extra selected commit messages: ['Start feature 2', 'Base commit']
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 132 in → 31 out (0 reasoning)
WIP: continue feature 1 WIP: start feature 1
JSON Schema Structured Output
(raw) { "commits": [ "WIP: continue feature 1", "WIP: start feature 1" ] }
poolside/laguna-xs.2:high PASS 100% 162 in → 327 out (267 reasoning)
The WIP commits for feature 1 that should be marked as squash/fixup are: 1. **be2d8f9** - `WIP: continue feature 1` 2. **1719afc** - `WIP: start feature 1`
poolside/laguna-xs.2:high__json_schema PASS 100% 160 in → 259 out (225 reasoning)
WIP: continue feature 1 WIP: start feature 1
JSON Schema Structured Output
(raw) { "commits": [ "WIP: continue feature 1", "WIP: start feature 1" ] }
poolside/laguna-xs.2:none FAIL 100% 164 in → 92 out (0 reasoning)
Looking at the commit history, I can identify the WIP commits for feature 1 that should be combined: 1. `WIP: continue feature 1` 2. `WIP: start feature 1` These two commits are marked as WIP and relate to feature 1, so they should be squashed or marked as fixup commits. The clean feature 2 commit (`Start feature 2`) and the base commit should remain separate.
Failure: Extra selected commit messages: ['Start feature 2', 'Base commit']
poolside/laguna-xs.2:none__json_schema PASS 100% 161 in → 33 out (0 reasoning)
WIP: start feature 1 WIP: continue feature 1
JSON Schema Structured Output
(raw) { "commits": [ "WIP: start feature 1", "WIP: continue feature 1" ] }
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)