f007 — git

Context lines with git grep -C

Tests ability to show context lines with git grep -C. Evaluates understanding of surrounding-line output.

medium git-grep context-lines surrounding

Baseline Repository

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

01 git init
02 git config user.email 'test@test.com'
03 git config user.name 'Test User'
04 printf 'import sys import os CONFIG = { "host": "localhost", "port": 8080, "debug": True, "workers": 4, } def start_server(): host = CONFIG["host"] port = CONFIG["port"] print(f"Starting on {host}:{port}") ' > server.py
05 git add .
06 git commit -m 'Add server config'
07 echo 'git grep -C 2 port' > .grep_command
08 git add .grep_command
09 git commit -m 'Add grep sentinel'

Prompt

Here is the output of a git grep -C 2 command showing 2 lines of context around matches for 'port'. How many total lines (including context) appear in the output? Output ONLY the number, nothing else.

Expected

Campaign Evidence

Loading campaign evidence…

Model Outputs (14)

deepseek/deepseek-v4-flash:high PASS 100% 156 in → 2,827 out (2,824 reasoning)

deepseek/deepseek-v4-flash:high__json_schema PASS 100% 219 in → 245 out (224 reasoning)

JSON Schema Structured Output

(raw) {"count": 13}

nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 169 in → 2,343 out (2,350 reasoning)

nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 169 in → 1,285 out (1,286 reasoning)

JSON Schema Structured Output

(raw) { "count" : 13 }

poolside/laguna-xs.2:high PASS 100% 210 in → 823 out (818 reasoning)

poolside/laguna-xs.2:high__json_schema PASS 100% 210 in → 409 out (398 reasoning)

JSON Schema Structured Output

(raw) {"count": 13}

deepseek/deepseek-v4-flash:none FAIL 0% 156 in → 1 out (0 reasoning)

Failure: Expected numeric answer '13', got '12'

deepseek/deepseek-v4-flash:none__json_schema FAIL 0% 156 in → 11 out (0 reasoning)

JSON Schema Structured Output

(raw) { "count": 6 }

Failure: Expected numeric answer '13', got '6'

mistralai/devstral-2512 FAIL 0% 156 in → 3 out

Failure: Expected numeric answer '13', got '12'

mistralai/devstral-2512__json_schema FAIL 0% 156 in → 8 out

JSON Schema Structured Output

(raw) {"count": 12}

Failure: Expected numeric answer '13', got '12'

nvidia/nemotron-3-nano-30b-a3b:none FAIL 0% 169 in → 2 out (0 reasoning)

Failure: Expected numeric answer '13', got '5'

nvidia/nemotron-3-nano-30b-a3b:none__json_schema FAIL 0% 169 in → 11 out (0 reasoning)

JSON Schema Structured Output

(raw) { "count": 15 }

Failure: Expected numeric answer '13', got '15'

poolside/laguna-xs.2:none FAIL 0% 210 in → 4 out (0 reasoning)

Failure: Expected numeric answer '13', got '30'

poolside/laguna-xs.2:none__json_schema FAIL 0% 210 in → 7 out (0 reasoning)

JSON Schema Structured Output

(raw) {"count":15}

Failure: Expected numeric answer '13', got '15'