: GitBench
File renamed
Tests ability to generate a commit message for a file rename. Evaluates recognizing rename operations in diffs.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'old content' > oldname.txt
  5. 05 git add oldname.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 git mv oldname.txt newname.txt
Prompt
Generate a concise commit message (max 50 characters) describing the changes in this diff. Focus on the action taken (e.g., 'Add', 'Fix', 'Update', 'Rename').
Expected
Rename oldname.txt to newname.txt
Loading campaign evidence…
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 98% 94 in → 283 out (196 reasoning)
Rename oldname.txt to newname.txt
JSON Schema Structured Output
(raw) { "commit_message": "Rename oldname.txt to newname.txt" }
poolside/laguna-xs.2:high PASS 97.7% 128 in → 537 out (526 reasoning)
Rename oldname.txt to newname.txt
deepseek/deepseek-v4-flash:high PASS 96% 81 in → 105 out (91 reasoning)
Rename oldname.txt to newname.txt
mistralai/devstral-2512__json_schema PASS 96% 81 in → 16 out
Rename oldname.txt to newname.txt
JSON Schema Structured Output
(raw) {"commit_message": "Rename oldname.txt to newname.txt"}
mistralai/devstral-2512 PASS 93.3% 81 in → 10 out
Rename oldname.txt to newname.txt
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 92.7% 94 in → 18 out (0 reasoning)
Rename oldname.txt to newname.txt
JSON Schema Structured Output
(raw) { "commit_message": "Rename oldname.txt to newname.txt" }
poolside/laguna-xs.2:high__json_schema PASS 92.7% 128 in → 229 out (213 reasoning)
Rename oldname.txt to newname.txt
JSON Schema Structured Output
(raw) {"commit_message": "Rename oldname.txt to newname.txt"}
poolside/laguna-xs.2:none__json_schema PASS 92.7% 128 in → 20 out (0 reasoning)
Rename oldname.txt to newname.txt
JSON Schema Structured Output
(raw) { "commit_message": "Rename oldname.txt to newname.txt" }
nvidia/nemotron-3-nano-30b-a3b:high PASS 92.3% 94 in → 451 out (306 reasoning)
Rename oldname.txt to newname.txt
deepseek/deepseek-v4-flash:high__json_schema PASS 91.7% 81 in → 162 out (146 reasoning)
Rename oldname.txt to newname.txt
JSON Schema Structured Output
(raw) {"commit_message":"Rename oldname.txt to newname.txt"}
poolside/laguna-xs.2:none PASS 91.7% 128 in → 10 out (0 reasoning)
Rename oldname.txt to newname.txt
deepseek/deepseek-v4-flash:none__json_schema PASS 90% 83 in → 14 out (0 reasoning)
Rename oldname.txt to newname.txt
JSON Schema Structured Output
(raw) {"commit_message":"Rename oldname.txt to newname.txt"}
nvidia/nemotron-3-nano-30b-a3b:none PASS 83.3% 94 in → 12 out (0 reasoning)
Rename file from oldname.txt to newname.txt
(empty output)
Failure: Model 'deepseek/deepseek-v4-flash:none' violated the reasoning_level=none invariant: provider reported 132 reasoning token(s)