: GitBench
Documentation update
Tests ability to generate a commit message for a documentation update (README). Evaluates recognizing documentation-only changes.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 printf '# Project\nThis is a project.\n' > README.md
  5. 05 git add README.md
  6. 06 git commit -m 'Initial commit'
  7. 07 printf '# Project\nThis is a project.\n\n## Installation\nRun `pip install` to install dependencies.\n' > README.md
  8. 08 git add README.md
Prompt
Generate a concise commit message (max 50 characters) describing the changes in this diff. Focus on the action taken (e.g., 'Add', 'Fix', 'Update').
Expected
Add installation section to README
Loading campaign evidence…
mistralai/devstral-2512__json_schema PASS 93.3% 121 in → 12 out
Add installation instructions to README
JSON Schema Structured Output
(raw) {"commit_message": "Add installation instructions to README"}
deepseek/deepseek-v4-flash:high__json_schema PASS 92.5% 120 in → 198 out (183 reasoning)
Add installation instructions to README
JSON Schema Structured Output
(raw) { "commit_message": "Add installation instructions to README" }
poolside/laguna-xs.2:none__json_schema PASS 92.3% 172 in → 12 out (0 reasoning)
Add installation section to README
JSON Schema Structured Output
(raw) {"commit_message": "Add installation section to README"}
deepseek/deepseek-v4-flash:none__json_schema PASS 91.7% 188 in → 13 out (0 reasoning)
Add installation instructions to README
JSON Schema Structured Output
(raw) {"commit_message": "Add installation instructions to README"}
poolside/laguna-xs.2:high__json_schema PASS 91% 172 in → 277 out (264 reasoning)
Add Installation section to README
JSON Schema Structured Output
(raw) {"commit_message": "Add Installation section to README"}
poolside/laguna-xs.2:high PASS 90% 172 in → 303 out (295 reasoning)
Add Installation section to README
poolside/laguna-xs.2:none PASS 90% 172 in → 7 out (0 reasoning)
Update README with installation instructions
deepseek/deepseek-v4-flash:high PASS 89.3% 120 in → 369 out (361 reasoning)
Add installation instructions to README
mistralai/devstral-2512 PASS 89% 121 in → 8 out
"Add installation instructions to README"
nvidia/nemotron-3-nano-30b-a3b:high PASS 89% 134 in → 225 out (166 reasoning)
Add Installation section to README
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 88.3% 134 in → 76 out (72 reasoning)
Add Installation section to README
JSON Schema Structured Output
(raw) { "commit_message": "Add Installation section to README" }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 75% 134 in → 14 out (0 reasoning)
Update: Add Installation section
JSON Schema Structured Output
(raw) { "commit_message": "Update: Add Installation section" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 71.7% 134 in → 9 out (0 reasoning)
Add Installation section with pip install instructions.
deepseek/deepseek-v4-flash:none PASS 61.3% 120 in → 6 out (0 reasoning)
Update README with installation instructions