: GitBench
P
Poolside / laguna-xs.2

high

88.2% 180 / 204 fixtures 1 run(s)
49,024 input / 62,164 total output / 54,469 reasoning within output tokens $0.01617040
88.7% 181 / 204 fixtures 1 run(s)
48,973 input / 61,806 total output / 57,590 reasoning within output tokens $0.01604410
Loading reliability summary…
Pass Rate Delta
+0.5% Text: 88.2% → JSON: 88.7%
+13
Gained
JSON pass / text fail
−12
Lost
Text pass / JSON fail
168
Unchanged Pass
Both pass
11
Unchanged Fail
Both fail
Fixture Reliability Delta
Fixture Text JSON Delta
Benchmark Deltas
Benchmark Text JSON Delta
commit_squash 50% 100% + 50%
reflog 100% 50% -50%
rebase 75% 58.3% -16.7%
worktree_usage 83.3% 100% + 16.7%
cherry_pick 66.7% 83.3% + 16.7%
git_show 83.3% 91.7% + 8.3%
branch_cleanup 100% 91.7% -8.3%
git_clean 75% 83.3% + 8.3%
merge_conflicts 83.3% 75% -8.3%
tag_management 100% 91.7% -8.3%
blame_forensics 100% 100% + 0%
commit_messages 100% 100% + 0%
git_bisect 100% 100% + 0%
git_grep 100% 100% + 0%
git_log_format 100% 100% + 0%
stash_recovery 100% 100% + 0%
submodule_usage 83.3% 83.3% + 0%
Changed Fixtures (25)