: GitBench
Search with no results returns empty output
Tests ability to handle searches with no results (empty output). Evaluates edge-case handling in search operations.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'def hello(): return "world"' > hello.py
  5. 05 git add hello.py
  6. 06 git commit -m 'Add hello function'
  7. 07 echo 'git grep nonexistent_token_xyz' > .grep_command
  8. 08 git add .grep_command
  9. 09 git commit -m 'Add grep sentinel'
Prompt
Here is the output of a git grep command run on this repository. Did the search find any matches? Output ONLY 'yes' or 'no', nothing else.
Expected
no
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 37 in → 198 out (217 reasoning)
no
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 37 in → 203 out (191 reasoning)
no
JSON Schema Structured Output
(raw) { "found": "no" }
mistralai/devstral-2512 PASS 100% 36 in → 2 out
no
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 49 in → 238 out (264 reasoning)
no
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 49 in → 1,077 out (1,203 reasoning)
no
JSON Schema Structured Output
(raw) { "found": "no" }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 49 in → 2 out (0 reasoning)
no
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 49 in → 10 out (0 reasoning)
no
JSON Schema Structured Output
(raw) { "found": "no" }
poolside/laguna-xs.2:high PASS 100% 84 in → 509 out (505 reasoning)
no
poolside/laguna-xs.2:high__json_schema PASS 100% 84 in → 319 out (309 reasoning)
no
JSON Schema Structured Output
(raw) {"found": "no"}
poolside/laguna-xs.2:none PASS 100% 84 in → 3 out (0 reasoning)
no
deepseek/deepseek-v4-flash:none FAIL 0% 37 in → 2 out (0 reasoning)
yes
Failure: Expected 'no', got 'yes'
deepseek/deepseek-v4-flash:none__json_schema FAIL 0% 39 in → 9 out (0 reasoning)
yes
JSON Schema Structured Output
(raw) { "found": "yes" }
Failure: Expected 'no', got 'yes'
yes
JSON Schema Structured Output
(raw) {"found": "yes"}
Failure: Expected 'no', got 'yes'
poolside/laguna-xs.2:none__json_schema FAIL 0% 84 in → 9 out (0 reasoning)
yes
JSON Schema Structured Output
(raw) {"found": "yes"}
Failure: Expected 'no', got 'yes'