: GitBench
List commit messages matching a grep pattern
Tests ability to list commit messages matching a grep pattern. Evaluates log filtering with message output.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'alpha' > file.txt
  5. 05 git add file.txt
  6. 06 git commit -m 'Initial commit'
  7. 07 echo 'beta' > file.txt
  8. 08 git add file.txt
  9. 09 git commit -m 'Add authentication module'
  10. 10 echo 'gamma' > file.txt
  11. 11 git add file.txt
  12. 12 git commit -m 'Fix login bug'
  13. 13 echo 'delta' > file.txt
  14. 14 git add file.txt
  15. 15 git commit -m 'Add authorization checks'
  16. 16 echo 'epsilon' > file.txt
  17. 17 git add file.txt
  18. 18 git commit -m 'Update README'
Prompt
List the commit messages of all commits that contain the word 'Add' in their message. Output one commit message per line, nothing else.
Expected
Add authentication module Add authorization checks
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 710 in → 66 out (57 reasoning)
Add authorization checks Add authentication module
deepseek/deepseek-v4-flash:none PASS 100% 699 in → 8 out (0 reasoning)
Add authorization checks Add authentication module
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 714 in → 29 out (0 reasoning)
Add authorization checks Add authentication module
JSON Schema Structured Output
(raw) { "commit_message_list": [ "Add authorization checks", "Add authentication module" ] }
mistralai/devstral-2512 PASS 100% 883 in → 8 out
Add authorization checks Add authentication module
mistralai/devstral-2512__json_schema PASS 100% 888 in → 17 out
Add authorization checks Add authentication module
JSON Schema Structured Output
(raw) {"commit_message_list": ["Add authorization checks", "Add authentication module"]}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 887 in → 161 out (190 reasoning)
Add authorization checks Add authentication module
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 900 in → 332 out (342 reasoning)
Add authorization checks Add authentication module
JSON Schema Structured Output
(raw) { "commit_message_list": [ "Add authorization checks", "Add authentication module" ] }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 905 in → 8 out (0 reasoning)
Add authorization checks Add authentication module
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 905 in → 24 out (0 reasoning)
Add authorization checks Add authentication module
JSON Schema Structured Output
(raw) { "commit_message_list": [ "Add authorization checks", "Add authentication module" ] }
poolside/laguna-xs.2:high PASS 100% 934 in → 129 out (119 reasoning)
Add authorization checks Add authentication module
poolside/laguna-xs.2:high__json_schema PASS 100% 932 in → 151 out (131 reasoning)
Add authorization checks Add authentication module
JSON Schema Structured Output
(raw) {"commit_message_list": ["Add authorization checks", "Add authentication module"]}
poolside/laguna-xs.2:none PASS 100% 901 in → 9 out (0 reasoning)
Add authorization checks Add authentication module
poolside/laguna-xs.2:none__json_schema PASS 100% 927 in → 17 out (0 reasoning)
Add authorization checks Add authentication module
JSON Schema Structured Output
(raw) {"commit_message_list": ["Add authorization checks", "Add authentication module"]}
Invalid JSON. Output: Add authorization checks Add authentication module
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)