: GitBench
Subdirectory changes
Tests ability to generate a commit message for changes in a subdirectory (src/). Evaluates path-aware commit descriptions.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 mkdir -p src/components
  5. 05 echo 'component1' > src/components/Component1.jsx
  6. 06 echo 'component2' > src/components/Component2.jsx
  7. 07 git add src/
Prompt
Generate a concise commit message (max 50 characters) describing the changes in this diff. Focus on the action taken (e.g., 'Add', 'Fix', 'Update').
Expected
Add React components to src
Loading campaign evidence…
deepseek/deepseek-v4-flash:high__json_schema PASS 92% 180 in → 230 out (207 reasoning)
Add Component1 and Component2
JSON Schema Structured Output
(raw) { "commit_message": "Add Component1 and Component2" }
deepseek/deepseek-v4-flash:high PASS 91.7% 180 in → 118 out (110 reasoning)
Add Component1 and Component2
nvidia/nemotron-3-nano-30b-a3b:high PASS 91.7% 204 in → 229 out (195 reasoning)
Add Component1 and Component2
nvidia/nemotron-3-nano-30b-a3b:none PASS 91.3% 204 in → 7 out (0 reasoning)
Add Component1 and Component2
mistralai/devstral-2512 PASS 91% 191 in → 7 out
Add Component1 and Component2
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 91% 204 in → 13 out (0 reasoning)
Add Component1, Component2
JSON Schema Structured Output
(raw) {"commit_message": "Add Component1, Component2"}
poolside/laguna-xs.2:none__json_schema PASS 91% 232 in → 13 out (0 reasoning)
Add component1 and component2
JSON Schema Structured Output
(raw) {"commit_message": "Add component1 and component2"}
deepseek/deepseek-v4-flash:none__json_schema PASS 90.7% 180 in → 16 out (0 reasoning)
Add Component1 and Component2
JSON Schema Structured Output
(raw) { "commit_message": "Add Component1 and Component2" }
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 90% 204 in → 719 out (485 reasoning)
Add Component1.jsx and Component2.jsx
JSON Schema Structured Output
(raw) { "commit_message": "Add Component1.jsx and Component2.jsx" }
poolside/laguna-xs.2:high__json_schema PASS 90% 232 in → 229 out (215 reasoning)
Add Component1 and Component2
JSON Schema Structured Output
(raw) {"commit_message": "Add Component1 and Component2"}
deepseek/deepseek-v4-flash:none PASS 89.3% 180 in → 3 out (0 reasoning)
Add initial components
mistralai/devstral-2512__json_schema PASS 88.3% 191 in → 13 out
Add Component1 and Component2
JSON Schema Structured Output
(raw) {"commit_message": "Add Component1 and Component2"}
poolside/laguna-xs.2:high PASS 83.3% 232 in → 260 out (250 reasoning)
Add Component1 and Component2 files
poolside/laguna-xs.2:none PASS 76.7% 232 in → 9 out (0 reasoning)
Add Component1 and Component2 files