: GitBench
Show merge commit and its parents
Tests ability to inspect a merge commit and identify its parents. Evaluates merge-commit structure comprehension.

These commands set up the repo before the model sees the prompt. They define the starting file structure, staged changes, and Git history.

  1. 01 git init
  2. 02 git config user.email 'test@test.com'
  3. 03 git config user.name 'Test User'
  4. 04 echo 'base' > shared.txt
  5. 05 git add shared.txt
  6. 06 git commit -m 'Base commit'
  7. 07 git checkout -b feature
  8. 08 echo 'feature' > feature.txt
  9. 09 git add feature.txt
  10. 10 git commit -m 'Feature work'
  11. 11 git checkout main
  12. 12 echo 'main change' > shared.txt
  13. 13 git add shared.txt
  14. 14 git commit -m 'Main line work'
  15. 15 git merge feature -m 'Merge feature branch'
Prompt
Using git show --merges -s --format=%P, how many parent commits does the merge commit have? Output ONLY the number, nothing else.
Expected
2
Loading campaign evidence…
deepseek/deepseek-v4-flash:high PASS 100% 263 in → 120 out (96 reasoning)
2
deepseek/deepseek-v4-flash:high__json_schema PASS 100% 264 in → 120 out (97 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
deepseek/deepseek-v4-flash:none PASS 100% 258 in → 2 out (0 reasoning)
2
deepseek/deepseek-v4-flash:none__json_schema PASS 100% 259 in → 8 out (0 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
mistralai/devstral-2512 PASS 100% 319 in → 2 out
2
mistralai/devstral-2512__json_schema PASS 100% 315 in → 7 out
2
JSON Schema Structured Output
(raw) {"count": 2}
nvidia/nemotron-3-nano-30b-a3b:high PASS 100% 324 in → 233 out (191 reasoning)
2
nvidia/nemotron-3-nano-30b-a3b:high__json_schema PASS 100% 325 in → 206 out (191 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
nvidia/nemotron-3-nano-30b-a3b:none PASS 100% 321 in → 2 out (0 reasoning)
2
nvidia/nemotron-3-nano-30b-a3b:none__json_schema PASS 100% 329 in → 12 out (0 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
poolside/laguna-xs.2:high PASS 100% 369 in → 385 out (381 reasoning)
2
poolside/laguna-xs.2:high__json_schema PASS 100% 360 in → 358 out (346 reasoning)
2
JSON Schema Structured Output
(raw) { "count": 2 }
poolside/laguna-xs.2:none PASS 100% 365 in → 3 out (0 reasoning)
2
poolside/laguna-xs.2:none__json_schema PASS 100% 363 in → 7 out (0 reasoning)
2
JSON Schema Structured Output
(raw) {"count": 2}