: GitBench
Regex search using git grep -E
Tests ability to perform regex search with git grep -E. Evaluates extended regex pattern matching in git.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 mkdir -p src
  5. 05 echo 'def get_user(id): return db.find(id)' > src/api.py
  6. 06 echo 'def get_post(id): return posts.find(id)' >> src/api.py
  7. 07 echo 'def get_comment(id): return comments.find(id)' >> src/api.py
  8. 08 echo 'def list_users(): return db.all()' >> src/api.py
  9. 09 echo 'def delete_post(id): posts.remove(id)' >> src/api.py
  10. 10 git add .
  11. 11 git commit -m 'Add API functions'
  12. 12 echo 'git grep -E "def get_"' > .grep_command
  13. 13 git add .grep_command
  14. 14 git commit -m 'Add grep sentinel'
Prompt
Here is the output of a git grep -E command that searches for function definitions matching 'def get_'. How many matching lines are there? Output ONLY the number, nothing else.
Expected
3
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 84 in → 40 out (38 reasoning)
3
deepseek/deepseek-v4-flash:none PASS 100% 84 in → 2 out (0 reasoning)
3
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 86 in → 7 out (0 reasoning)
3
JSON Schema Structured Output
(raw) { "count": 3 }
mistralai/devstral-2512 PASS 100% 82 in → 2 out
3
mistralai/devstral-2512__json_schema PASS 100% 82 in → 7 out
3
JSON Schema Structured Output
(raw) {"count": 3}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 95 in → 95 out (96 reasoning)
3
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 95 in → 71 out (67 reasoning)
3
JSON Schema Structured Output
(raw) { "count": 3 }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 95 in → 2 out (0 reasoning)
3
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 95 in → 10 out (0 reasoning)
3
JSON Schema Structured Output
(raw) { "count": 3 }
poolside/laguna-xs.2:high PASS 100% 130 in → 114 out (110 reasoning)
3
poolside/laguna-xs.2:high__json_schema PASS 100% 130 in → 106 out (98 reasoning)
3
JSON Schema Structured Output
(raw) {"count": 3}
poolside/laguna-xs.2:none PASS 100% 130 in → 3 out (0 reasoning)
3
poolside/laguna-xs.2:none__json_schema PASS 100% 130 in → 7 out (0 reasoning)
3
JSON Schema Structured Output
(raw) {"count": 3}
Invalid structured output. Output: 3
JSON Schema Structured Output
Structured Output Error
Structured output schema validation failed: $ must be of type object
Failure: Structured output schema validation failed: $ must be of type object