Bad commit changes multiple files (app.py and config.py) - model identifies the multi-file bad commit
Tests ability to identify a multi-file bad commit (app.py + config.py) via bisect. Evaluates bisect for changes spanning multiple files.
Baseline Repository
These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.
- 01
git init - 02
git config user.email 'test@test.com' - 03
git config user.name 'Test User' - 04
echo '#!/bin/bash' > test.sh - 05
echo 'grep -q "DEBUG = False" app.py && grep -q "DEBUG = False" config.py' >> test.sh - 06
chmod +x test.sh - 07
echo 'DEBUG = False' > app.py && echo 'DEBUG = False' > config.py - 08
git add test.sh app.py config.py && git commit -m 'Initial commit' - 09
echo '# debug defaults' >> config.py - 10
git add app.py config.py && git commit -m 'set debug off' - 11
echo 'DEBUG = True' > app.py && echo 'DEBUG = True' > config.py - 12
git add app.py config.py && git commit -m 'enable debug mode' - 13
echo '# verbose startup logs' >> app.py - 14
git add app.py && git commit -m 'add startup logging'
Prompt
You are debugging a git repository. The following commit history has one bad commit that broke the build. Identify the bad commit hash (first 7 characters) or commit subject line that introduced the bug. Use the git log and test output to determine which commit is bad.
Your answer should be the commit hash (e.g., 'abc1234') or the commit subject line.
Expected
enable debug mode
Campaign Evidence
Loading campaign evidence…
Model Outputs (14)
da2a5d2
6ed3388
JSON Schema
Structured Output
(raw) {
"commit": "6ed3388"
}
Based on the test results, the build was passing at commit `59fa610` ("set debug off") and started failing at commit `604b522` ("enable debug mode"). This means `604b522` is the first commit where the failure occurred, making it the bad commit.
**604b522**
2ef3383
JSON Schema
Structured Output
(raw) { "commit": "2ef3383" }
Based on the test results, the build started failing at commit `7e3a3c2 enable debug mode` and continued to fail in subsequent commits. The commit that introduced the bug is:
`7e3a3c2` (or "enable debug mode")
6911b33
JSON Schema
Structured Output
(raw) {"commit": "6911b33"}
38e6e07
4426dac
JSON Schema
Structured Output
(raw) {
"commit": "4426dac"
}
e8ea84d enable debug mode
0b2b999
JSON Schema
Structured Output
(raw) { "commit": "0b2b999" }
Looking at the test results chronologically:
- e7bebac Initial commit: PASS
- e26b2f4 set debug off: PASS
- bc11dd5 enable debug mode: FAIL
- 631e13d add startup logging: FAIL
The tests pass up until commit e26b2f4, but fail starting at bc11dd5. This means the bug was introduced in commit bc11dd5 ("enable debug mode").
23f4f0d
JSON Schema
Structured Output
(raw) {"commit": "23f4f0d"}
Looking at the test results, I can see the pattern:
- **96fbea7** (Initial commit): PASS
- **79f408b** (set debug off): PASS
- **e85ea5f** (enable debug mode): FAIL
- **0bbe5f4** (add startup logging): FAIL
The build started failing at commit **e85ea5f** "enable debug mode". The subsequent commit **0bbe5f4** "add startup logging" also fails, but that's because it's built on top of the already-broken state.
The bad commit is: **e85ea5f**
a1227a2
JSON Schema
Structured Output
(raw) {"commit": "a1227a2"}