: GitBench
Multiple patterns with git grep -e
Tests ability to search for multiple patterns with git grep -e. Evaluates OR-pattern matching in search.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 mkdir -p src
  5. 05 echo 'import requests import json import os API_URL = "https://api.example.com" def fetch_data(endpoint): resp = requests.get(f"{API_URL}/{endpoint}") return json.loads(resp.text) def save_to_file(data, path): with open(path, "w") as f: json.dump(data, f) def load_from_file(path): with open(path) as f: return json.load(f) ' > src/io.py
  6. 06 git add .
  7. 07 git commit -m 'Add IO module'
  8. 08 echo 'git grep -e json -e requests' > .grep_command
  9. 09 git add .grep_command
  10. 10 git commit -m 'Add grep sentinel'
Prompt
Here is the output of a git grep -e command that searches for multiple patterns ('json' and 'requests'). How many total matching lines appear? Output ONLY the number, nothing else.
Expected
6
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 114 in → 139 out (136 reasoning)
6
deepseek/deepseek-v4-flash:none PASS 100% 114 in → 2 out (0 reasoning)
6
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 114 in → 11 out (0 reasoning)
6
JSON Schema Structured Output
(raw) { "count": 6 }
mistralai/devstral-2512__json_schema PASS 100% 104 in → 7 out
6
JSON Schema Structured Output
(raw) {"count": 6}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 117 in → 166 out (187 reasoning)
6
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 117 in → 258 out (265 reasoning)
6
JSON Schema Structured Output
(raw) { "count": 6 }
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 117 in → 8 out (0 reasoning)
6
JSON Schema Structured Output
(raw) { "count": 6 }
poolside/laguna-xs.2:high PASS 100% 153 in → 185 out (181 reasoning)
6
poolside/laguna-xs.2:high__json_schema PASS 100% 153 in → 156 out (149 reasoning)
6
JSON Schema Structured Output
(raw) {"count":6}
Invalid JSON. Output:
JSON Schema Structured Output
Structured Output Error
Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
Failure: Failed to parse structured JSON response: Expecting value: line 1 column 1 (char 0)
mistralai/devstral-2512 FAIL 0% 104 in → 2 out
5
Failure: Expected numeric answer '6', got '5'
nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 117 in → 2 out (0 reasoning)
3
Failure: Expected numeric answer '6', got '3'
poolside/laguna-xs.2:none FAIL 0% 153 in → 3 out (0 reasoning)
7
Failure: Expected numeric answer '6', got '7'
poolside/laguna-xs.2:none__json_schema FAIL 0% 153 in → 7 out (0 reasoning)
7
JSON Schema Structured Output
(raw) {"count": 7}
Failure: Expected numeric answer '6', got '7'