Bad commit changes database.py and api.py - model identifies the multi-file bad commit
Tests ability to identify a multi-file bad commit (database.py + api.py) via bisect. Evaluates bisect for backend multi-file regressions.
Baseline Repository
These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.
- 01
git init - 02
git config user.email 'test@test.com' - 03
git config user.name 'Test User' - 04
echo '#!/bin/bash' > test.sh - 05
echo 'grep -q "localhost" src/database.py && grep -q "localhost" src/api.py' >> test.sh - 06
chmod +x test.sh - 07
mkdir -p src && echo 'DB_HOST = "localhost"' > src/database.py && echo 'API_BASE = "http://localhost:8000"' > src/api.py - 08
git add test.sh src/database.py src/api.py && git commit -m 'Initial commit' - 09
echo 'DB_POOL = 2' >> src/database.py - 10
git add src/database.py src/api.py && git commit -m 'configure localhost' - 11
echo 'DB_HOST = "production.db.com"' > src/database.py && echo 'API_BASE = "http://production.api.com"' > src/api.py - 12
git add src/database.py src/api.py && git commit -m 'switch to prod' - 13
echo 'DB_POOL = 5' >> src/database.py - 14
git add src/database.py && git commit -m 'add pool config' - 15
echo 'DB_PORT = 5432' >> src/database.py - 16
git add src/database.py && git commit -m 'add port config'
Prompt
You are debugging a git repository. The following commit history has one bad commit that broke the build. Identify the bad commit hash (first 7 characters) or commit subject line that introduced the bug. Use the git log and test output to determine which commit is bad.
Your answer should be the commit hash (e.g., 'abc1234') or the commit subject line.
Expected
switch to prod
Campaign Evidence
Loading campaign evidence…
Model Outputs (14)
fbf0190
Based on the test results, the first commit that failed is `06186cf switch to prod`. The commits before it (older) all passed, and this commit and all commits after it (newer) failed.
The bad commit hash is: **06186cf**
79a319f
JSON Schema
Structured Output
(raw) { "commit": "79a319f" }
Based on the test results, the build was working fine until commit `86f8b1f switch to prod`, which introduced the failure. All subsequent commits (`45baf28` and `2a3756f`) also failed, indicating that the bug was introduced in `86f8b1f`.
The bad commit is:
`86f8b1f` (or the subject line "switch to prod").
8ef6d70
JSON Schema
Structured Output
(raw) {"commit": "8ef6d70"}
09a1930
3d4f5bd
JSON Schema
Structured Output
(raw) {
"commit": "3d4f5bd"
}
The first failure occurs at commit **4c854f8 switch to prod**.
**Commit hash:** `4c854f8` (first 7 characters)
**Commit subject line:** `switch to prod`
This commit introduced the bug that broke the build.
33bf4a8
JSON Schema
Structured Output
(raw) { "commit" : "33bf4a8" }
Looking at the test results chronologically:
- **7a3791c Initial commit**: PASS
- **abe4c11 configure localhost**: PASS
- **99283c6 switch to prod**: FAIL
The tests pass up through commit `abe4c11`, but fail starting with commit `99283c6 "switch to prod"`. This is the commit that introduced the bug.
**Bad commit: 99283c6**
756f72c switch to prod
JSON Schema
Structured Output
(raw) {"commit": "756f72c switch to prod"}
Looking at the test results, I can see that:
- Commits up to and including `bb22c42` (configure localhost) pass
- Commits `d647beb` through `14e0ff0` all fail
This means the bug was introduced in the first failing commit, which is `d647beb` with the subject "switch to prod".
The answer is: **d647beb**
3175aac
JSON Schema
Structured Output
(raw) {"commit": "3175aac"}
Invalid JSON. Output: 689b37e
JSON Schema
Structured Output
Structured Output Error
Failed to parse structured JSON response: Extra data: line 1 column 4 (char 3)
Failure: Failed to parse structured JSON response: Extra data: line 1 column 4 (char 3)