: GitBench
Multiple files added
Tests ability to generate a commit message summarizing multiple new files (config.py, main.py, utils.py). Evaluates grouping related file additions.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'config' > config.py
  5. 05 echo 'main' > main.py
  6. 06 echo 'utils' > utils.py
  7. 07 git add config.py main.py utils.py
Prompt
Generate a concise commit message (max 50 characters) describing the changes in this diff. Focus on the action taken (e.g., 'Add', 'Fix', 'Update').
Expected
Add initial project files
Loading campaign evidence…
nvidia/nemotron-3-nano-30b-a3b:high PASS 91.7% 239 in → 432 out (322 reasoning)
Add config, main, utils modules
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 91.3% 239 in → 16 out (0 reasoning)
Add config.py, main.py, and utils.py
JSON Schema Structured Output
(raw) {"commit_message":"Add config.py, main.py, and utils.py"}
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 91% 239 in → 378 out (267 reasoning)
Add config, main, utils modules
JSON Schema Structured Output
(raw) { "commit_message": "Add config, main, utils modules" }
poolside/laguna-xs.2:high PASS 91% 274 in → 307 out (298 reasoning)
Add config, main, utils
deepseek/deepseek-v4-flash:high PASS 90% 213 in → 273 out (267 reasoning)
Add initial project files
deepseek/deepseek-v4-flash:none PASS 89.3% 196 in → 5 out (0 reasoning)
Add initial project files
poolside/laguna-xs.2:none__json_schema PASS 88.3% 274 in → 15 out (0 reasoning)
Add config, main, and utils files
JSON Schema Structured Output
(raw) {"commit_message": "Add config, main, and utils files"}
mistralai/devstral-2512 PASS 87.7% 226 in → 11 out
"Add config, main, and utils files"
nvidia/nemotron-3-nano-30b-a3b:none PASS 87.7% 239 in → 7 out (0 reasoning)
Add config, main, utils
deepseek/deepseek-v4-flash:high__json_schema PASS 85% 198 in → 244 out (228 reasoning)
Add initial config, main, and utils files
JSON Schema Structured Output
(raw) {"commit_message":"Add initial config, main, and utils files"}
deepseek/deepseek-v4-flash:none__json_schema PASS 85% 196 in → 16 out (0 reasoning)
Add initial config, main, and utils files
JSON Schema Structured Output
(raw) {"commit_message": "Add initial config, main, and utils files"}
poolside/laguna-xs.2:none PASS 85% 274 in → 11 out (0 reasoning)
Add initial config, main, and utils files
mistralai/devstral-2512__json_schema PASS 81% 226 in → 16 out
Add new files: config, main, utils
JSON Schema Structured Output
(raw) {"commit_message": "Add new files: config, main, utils"}
poolside/laguna-xs.2:high__json_schema PASS 63.3% 274 in → 269 out (248 reasoning)
Add config, main, and utils files
JSON Schema Structured Output
(raw) { "commit_message": "Add config, main, and utils files" }