Linear history with bad commit at end - model identifies most recent commit as bad
Tests ability to identify the most recent commit as bad using git bisect on a linear history. Evaluates basic bisect with the bug at HEAD.
Baseline Repository
These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.
- 01
git init - 02
git config user.email 'test@test.com' - 03
git config user.name 'Test User' - 04
echo '#!/bin/bash' > test.sh - 05
echo 'grep "return a + b" src/index.js' >> test.sh - 06
chmod +x test.sh - 07
mkdir -p src && echo 'function add(a, b) { return a + b; }' > src/index.js - 08
git add test.sh src/index.js - 09
git commit -m 'Initial commit' - 10
echo '// feature X enabled' >> src/index.js - 11
git add src/index.js - 12
git commit -m 'add feature X' - 13
echo 'function add(a, b) { return a - b; }' > src/index.js - 14
git add src/index.js - 15
git commit -m 'change add operation'
Prompt
You are debugging a git repository. The following commit history has one bad commit that broke the build. Identify the bad commit hash (first 7 characters) or commit subject line that introduced the bug. Use the git log and test output to determine which commit is bad.
Your answer should be the commit hash (e.g., 'abc1234') or the commit subject line.
Expected
change add operation
Campaign Evidence
Loading campaign evidence…
Model Outputs (14)
345a9ad
4a66070
JSON Schema
Structured Output
(raw) { "commit": "4a66070" }
The bad commit is `cdff16f`.
f6dc4d9
JSON Schema
Structured Output
(raw) { "commit": "f6dc4d9"
}
The bad commit is `a0f84ca` (or "change add operation"). This is the first commit in the history (when read from newest to oldest) that caused the test to fail. The previous commits (`e44ac68` and `e74e10e`) passed the tests, so the bug was introduced by `a0f84ca`.
942fc98
JSON Schema
Structured Output
(raw) {"commit": "942fc98"}
c3781f8
b8ad21c
JSON Schema
Structured Output
(raw) { "commit": "b8ad21c" }
71edd23
995e438
JSON Schema
Structured Output
(raw) {
"commit": "995e438"
}
Based on the commit history and test results:
- **c63a890** (Initial commit): PASS
- **f768c28** (add feature X): PASS
- **2c8fdd3** (change add operation): FAIL
The bad commit is **2c8fdd3** ("change add operation") as it is the first commit that introduced test failures.
**Answer: 2c8fdd3**
ddee6bd
JSON Schema
Structured Output
(raw) {"commit": "ddee6bd"}
Looking at the test results, I can see that the build passed for the first two commits but failed on the third commit.
- `e86ac82 Initial commit`: PASS
- `82e8655 add feature X`: PASS
- `cfc7706 change add operation`: FAIL
The commit that introduced the bug is `cfc7706` with the subject "change add operation".
cfc7706
4b51354
JSON Schema
Structured Output
(raw) {"commit": "4b51354"}