: GitBench
Deepseek / deepseek-v4-flash

high

90.7% 185 / 204 fixtures 1 run(s)
34,780 input / 66,687 total output / 63,976 reasoning within output tokens $0.02002330
61.8% 126 / 204 fixtures 1 run(s)
24,989 input / 44,696 total output / 42,063 reasoning within output tokens $0.01536307
Loading reliability summary…
Pass Rate Delta
-28.9% Text: 90.7% → JSON: 61.8%
+7
Gained
JSON pass / text fail
−66
Lost
Text pass / JSON fail
119
Unchanged Pass
Both pass
12
Unchanged Fail
Both fail
Fixture Reliability Delta
Fixture Text JSON Delta
f011 0% (0/1) 100% (1/1) -100%
Benchmark Deltas
Benchmark Text JSON Delta
branch_cleanup 100% 25% -75%
reflog 100% 50% -50%
blame_forensics 100% 58.3% -41.7%
commit_messages 91.7% 50% -41.7%
tag_management 91.7% 58.3% -33.3%
git_grep 100% 66.7% -33.3%
git_log_format 100% 66.7% -33.3%
stash_recovery 100% 66.7% -33.3%
worktree_usage 83.3% 50% -33.3%
cherry_pick 75% 50% -25%
commit_squash 91.7% 66.7% -25%
git_bisect 100% 75% -25%
merge_conflicts 75% 50% -25%
git_clean 91.7% 83.3% -8.3%
git_show 91.7% 100% + 8.3%
rebase 75% 66.7% -8.3%
submodule_usage 75% 66.7% -8.3%
Changed Fixtures (73)