: GitBench
Executable permission change
Tests ability to generate a commit message for an executable permission change. Evaluates recognizing metadata changes vs content changes.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo '#!/bin/bash' > build.sh
  5. 05 echo 'echo building...' >> build.sh
  6. 06 git add build.sh
  7. 07 git commit -m 'Add build script'
  8. 08 chmod +x build.sh
  9. 09 git add build.sh
Prompt
Generate a concise commit message (max 50 characters) describing the changes in this diff. Focus on the action taken (e.g., 'Add', 'Fix', 'Update').
Expected
Make build.sh executable
Loading campaign evidence…
nvidia/nemotron-3-nano-30b-a3b:high PASS 93.3% 87 in → 189 out (170 reasoning)
Make build.sh executable
poolside/laguna-xs.2:high PASS 93.3% 122 in → 214 out (207 reasoning)
Make build.sh executable
deepseek/deepseek-v4-flash:none__json_schema PASS 93% 66 in → 11 out (0 reasoning)
Make build.sh executable
JSON Schema Structured Output
(raw) {"commit_message": "Make build.sh executable"}
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 93% 87 in → 180 out (156 reasoning)
Make build.sh executable
JSON Schema Structured Output
(raw) { "commit_message": "Make build.sh executable" }
deepseek/deepseek-v4-flash:none PASS 91.7% 66 in → 6 out (0 reasoning)
Fix permissions in build.sh
mistralai/devstral-2512__json_schema PASS 91.7% 74 in → 13 out
Update build.sh permissions to executable
JSON Schema Structured Output
(raw) {"commit_message": "Update build.sh permissions to executable"}
poolside/laguna-xs.2:high__json_schema PASS 91% 122 in → 269 out (252 reasoning)
Make build.sh executable
JSON Schema Structured Output
(raw) { "commit_message": "Make build.sh executable" }
poolside/laguna-xs.2:none__json_schema PASS 90% 122 in → 16 out (0 reasoning)
Chmod +x build.sh for executable permissions
JSON Schema Structured Output
(raw) {"commit_message": "Chmod +x build.sh for executable permissions"}
mistralai/devstral-2512 PASS 88.3% 74 in → 7 out
"Fix build.sh permissions"
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 87.7% 87 in → 14 out (0 reasoning)
Fix permissions to make build.sh executable
JSON Schema Structured Output
(raw) {"commit_message": "Fix permissions to make build.sh executable"}
nvidia/nemotron-3-nano-30b-a3b:none PASS 87% 87 in → 5 out (0 reasoning)
Update build script permissions
poolside/laguna-xs.2:none PASS 86.7% 122 in → 6 out (0 reasoning)
Fix build.sh permissions
deepseek/deepseek-v4-flash:high PASS 81.7% 66 in → 164 out (160 reasoning)
Make build.sh executable
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)