f001 — git

Basic keyword search across tracked files

Tests ability to search tracked files for a keyword using git grep. Evaluates basic code search comprehension.

Baseline Repository

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

01 git init
02 git config user.email 'test@test.com'
03 git config user.name 'Test User'
04 mkdir -p src
05 echo 'def authenticate(user, password): if not user: raise ValueError("user required") return check_credentials(user, password)' > src/auth.py
06 echo 'def render_page(user): template = load_template("home.html") return template.render(user=user)' > src/views.py
07 echo 'import logging logger = logging.getLogger(__name__) def check_credentials(user, password): logger.info(f"Auth attempt for {user}") return db.lookup(user, password)' > src/db.py
08 git add .
09 git commit -m 'Add project files'
10 echo 'git grep authenticate' > .grep_command
11 git add .grep_command
12 git commit -m 'Add grep sentinel'

Prompt

Here is the output of a git grep command run on this repository. Which files contain the word 'authenticate'? Output ONLY the filenames, one per line, nothing else.

Expected

src/auth.py

Campaign Evidence

Loading campaign evidence…

Model Outputs (14)

deepseek/deepseek-v4-flash:high PASS 100% 52 in → 74 out (70 reasoning)

src/auth.py

deepseek/deepseek-v4-flash:high__json_schema PASS 100% 52 in → 78 out (66 reasoning)

src/auth.py

JSON Schema Structured Output