Branching history with bad commit that breaks import - model identifies the broken import
Tests ability to identify a commit that broke an import via bisect. Evaluates bisect for a build-breaking regression.
Baseline Repository
These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.
- 01
git init - 02
git config user.email 'test@test.com' - 03
git config user.name 'Test User' - 04
echo '#!/bin/bash' > test.sh - 05
echo 'grep -q "from utils import" main.py' >> test.sh - 06
chmod +x test.sh - 07
echo 'from utils import helpers' > main.py - 08
git add test.sh main.py && git commit -m 'Initial commit' - 09
echo '# helper import configured' >> main.py - 10
git add main.py && git commit -m 'add import' - 11
echo 'import os' > main.py - 12
git add main.py && git commit -m 'replace helper import' - 13
git checkout -b feature - 14
echo '# add logging hook' >> main.py - 15
git add main.py && git commit -m 'add logging hook' - 16
git checkout main - 17
git merge feature
Prompt
You are debugging a git repository. The following commit history has one bad commit that broke the build. Identify the bad commit hash (first 7 characters) or commit subject line that introduced the bug. Use the git log and test output to determine which commit is bad.
Your answer should be the commit hash (e.g., 'abc1234') or the commit subject line.
Expected
replace helper import
Campaign Evidence
Loading campaign evidence…
Model Outputs (14)
8c517d6
52b4aac
JSON Schema
Structured Output
(raw) { "commit": "52b4aac" }
The bad commit that introduced the bug is **7f98d78** ("replace helper import"), as this is the first commit where the tests started failing (exit code 1) and all subsequent commits also failed.
3223ac8
JSON Schema
Structured Output
(raw) { "commit": "3223ac8" }
The bad commit is `09b126b` (or its subject line "replace helper import"), as it is the first commit in the history where the tests start failing (exit code 1). All commits before it passed the tests, and all commits after it also failed, indicating that this commit introduced the bug.
3b6b4b0
JSON Schema
Structured Output
(raw) {"commit": "3b6b4b0"}
f7a8748 replace helper import
b3f6043 replace helper import
JSON Schema
Structured Output
(raw) {"commit":"b3f6043 replace helper import"}
38683ab replace helper import
b0e3c98
JSON Schema
Structured Output
(raw) {
"commit": "b0e3c98"
}
Looking at the test results, I can see when the failure was introduced:
- **e98f645** (Initial commit): PASS
- **708ac1b** (add import): PASS
- **0dd4e3b** (replace helper import): FAIL ← First failure
- **1477338** (add logging hook): FAIL
The bug was introduced in commit **0dd4e3b** with the subject line **"replace helper import"**. This is the first commit where the tests failed, indicating it's the problematic commit.
5faec08
JSON Schema
Structured Output
(raw) {"commit": "5faec08"}
Looking at the test results, I can see that:
- `f76b34b Initial commit`: PASS
- `89cca12 add import`: PASS
- `0f26780 replace helper import`: FAIL
- `a650f3b add logging hook`: FAIL
The build was passing with `89cca12 add import` but failed with the next commit `0f26780 replace helper import`. This commit introduced the bug.
The bad commit is: **0f26780** (replace helper import)
78fd31c
JSON Schema
Structured Output
(raw) {"commit": "78fd31c"}