Linear history with bad commit in the middle - model identifies middle commit as bad
Tests ability to identify a middle commit as bad in a linear history. Evaluates bisect with the bug not at either end.
Baseline Repository
These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.
- 01
git init - 02
git config user.email 'test@test.com' - 03
git config user.name 'Test User' - 04
echo '#!/bin/bash' > test.sh - 05
echo 'grep "42" config.json' >> test.sh - 06
chmod +x test.sh - 07
echo '{"value": 42}' > config.json - 08
git add test.sh config.json - 09
git commit -m 'Initial commit' - 10
echo '{"value": 42, "name": "default"}' > config.json - 11
git add config.json - 12
git commit -m 'set config value' - 13
echo '{"value": 0}' > config.json - 14
git add config.json - 15
git commit -m 'reset config to zero' - 16
echo '{"value": 100}' > config.json - 17
git add config.json - 18
git commit -m 'final config'
Prompt
You are debugging a git repository. The following commit history has one bad commit that broke the build. Identify the bad commit hash (first 7 characters) or commit subject line that introduced the bug. Use the git log and test output to determine which commit is bad.
Your answer should be the commit hash (e.g., 'abc1234') or the commit subject line.
Expected
reset config to zero
Campaign Evidence
Loading campaign evidence…
Model Outputs (14)
7fbfcdd
936ee1b
JSON Schema
Structured Output
(raw) { "commit": "936ee1b" }
The bug was introduced in commit `09d40f8` (reset config to zero), as this is the first commit that resulted in a FAIL (exit code 1) when tested.
a589cb3
JSON Schema
Structured Output
(raw) { "commit": "a589cb3" }
Based on the test results, the build started failing at commit `f06f971` ("reset config to zero") and continued to fail in the subsequent commit `1b59dc3` ("final config").
Since the test was passing in `1bcd0e8` ("set config value") but started failing in `f06f971`, the bad commit is:
`f06f971` (or "reset config to zero")
6b5664b
JSON Schema
Structured Output
(raw) {"commit": "6b5664b"}
8718561
bd6e853
JSON Schema
Structured Output
(raw) {
"commit": "bd6e853"
}
cb5329d
JSON Schema
Structured Output
(raw) {
"commit": "cb5329d"
}
Looking at the test results chronologically:
- e2c8528 Initial commit: PASS
- 9128ccc set config value: PASS
- e25177e reset config to zero: FAIL
- 7735d10 final config: FAIL
The bad commit is **e25177e** (first 7 characters: `e25177e`). This is the first commit where the tests started failing.
b005af0
JSON Schema
Structured Output
(raw) {
"commit": "b005af0"
}
Looking at the test results, I can see that:
1. `9a43286 Initial commit` - PASS
2. `d1c0225 set config value` - PASS
3. `99fdd39 reset config to zero` - FAIL
The first commit where the tests failed is `99fdd39 reset config to zero`. This commit introduced the bug since all previous commits passed their tests.
**Answer: 99fdd39**
080f1c3
JSON Schema
Structured Output
(raw) {"commit": "080f1c3"}
eee641d final config
Failure: Expected target hash/prefix or subject 'reset config to zero'