: GitBench
Search with line numbers using git grep -n
Tests ability to search with line numbers using git grep -n. Evaluates output-format options for search results.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 mkdir -p src
  5. 05 printf 'import os import sys import logging logger = logging.getLogger(__name__) def main(): logger.info("Starting") os.environ["MODE"] = "prod" sys.exit(0) if __name__ == "__main__": main() ' > src/app.py
  6. 06 git add .
  7. 07 git commit -m 'Add application entry point'
  8. 08 echo 'git grep -n import' > .grep_command
  9. 09 git add .grep_command
  10. 10 git commit -m 'Add grep sentinel'
Prompt
Here is the output of a git grep -n command showing line numbers. On which line numbers does 'import' appear in src/app.py? Output ONLY the line numbers separated by commas, nothing else.
Expected
1,2,3
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 72 in → 52 out (40 reasoning)
1,2,3
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 72 in → 58 out (43 reasoning)
1,2,3
JSON Schema Structured Output
(raw) { "line_numbers": "1,2,3" }
deepseek/deepseek-v4-flash:none PASS 100% 72 in → 5 out (0 reasoning)
1,2,3
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 72 in → 14 out (0 reasoning)
1,2,3
JSON Schema Structured Output
(raw) { "line_numbers": "1,2,3" }
mistralai/devstral-2512 PASS 100% 72 in → 6 out
1,2,3
mistralai/devstral-2512__json_schema PASS 100% 72 in → 12 out
1,2,3
JSON Schema Structured Output
(raw) {"line_numbers": "1,2,3"}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 85 in → 56 out (45 reasoning)
1,2,3
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 85 in → 164 out (147 reasoning)
1,2,3
JSON Schema Structured Output
(raw) { "line_numbers": "1,2,3" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 85 in → 6 out (0 reasoning)
1,2,3
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 85 in → 15 out (0 reasoning)
1,2,3
JSON Schema Structured Output
(raw) { "line_numbers" : "1,2,3" }
poolside/laguna-xs.2:high PASS 100% 116 in → 101 out (93 reasoning)
1,2,3
poolside/laguna-xs.2:high__json_schema PASS 100% 116 in → 63 out (49 reasoning)
1,2,3
JSON Schema Structured Output
(raw) {"line_numbers": "1,2,3"}
poolside/laguna-xs.2:none PASS 100% 116 in → 7 out (0 reasoning)
1,2,3
poolside/laguna-xs.2:none__json_schema PASS 100% 116 in → 17 out (0 reasoning)
1,2,3
JSON Schema Structured Output
(raw) { "line_numbers": "1,2,3" }