Mistralai / devstral-2512
default
Reliability by Benchmark (Text)
Loading reliability summary…
Text vs JSON Schema Comparison
Pass Rate Delta
+23%
Text: 65.7% →
JSON: 88.7%
+52
Gained
JSON pass / text fail
−5
Lost
Text pass / JSON fail
129
Unchanged Pass
Both pass
18
Unchanged Fail
Both fail
Fixture Reliability Delta
| Fixture | Text | JSON | Delta |
|---|
Benchmark Deltas
| Benchmark | Text | JSON | Delta |
|---|---|---|---|
| worktree_usage | 0% | 100% | + 100% |
| submodule_usage | 8.3% | 100% | + 91.7% |
| tag_management | 33.3% | 91.7% | + 58.3% |
| git_clean | 16.7% | 66.7% | + 50% |
| commit_squash | 66.7% | 100% | + 33.3% |
| branch_cleanup | 100% | 75% | -25% |
| rebase | 50% | 75% | + 25% |
| cherry_pick | 58.3% | 75% | + 16.7% |
| merge_conflicts | 58.3% | 75% | + 16.7% |
| git_log_format | 83.3% | 91.7% | + 8.3% |
| blame_forensics | 75% | 83.3% | + 8.3% |
| git_grep | 75% | 83.3% | + 8.3% |
| commit_messages | 100% | 100% | + 0% |
| git_bisect | 100% | 100% | + 0% |
| git_show | 91.7% | 91.7% | + 0% |
| reflog | 100% | 100% | + 0% |
| stash_recovery | 100% | 100% | + 0% |
Changed Fixtures (57)
f001 +gain f006 -loss f010 +gain f003 -loss f009 -loss f011 -loss f004 +gain f007 +gain f008 +gain f009 +gain f010 +gain f011 +gain f001 +gain f003 +gain f006 +gain f007 +gain f008 +gain f009 +gain f005 +gain f006 -loss f011 +gain f006 +gain f007 +gain f008 +gain f004 +gain f007 +gain f008 +gain f001 +gain f002 +gain f003 +gain f004 +gain f005 +gain f006 +gain f007 +gain f008 +gain f009 +gain f010 +gain f012 +gain f001 +gain f002 +gain f003 +gain f006 +gain f009 +gain f010 +gain f011 +gain f001 +gain f002 +gain f003 +gain f004 +gain f005 +gain f006 +gain f007 +gain f008 +gain f009 +gain f010 +gain f011 +gain f012 +gain
Fixture Gallery (204)
blame_forensics f001
0%
Failed
Tokens
247
Input
17
Output
blame_forensics f002
100%
Passed
Tokens
238
Input
5
Output
blame_forensics f003
100%
Passed
Tokens
279
Input
7
Output
blame_forensics f004
100%
Passed
Tokens
192
Input
6
Output
blame_forensics f005
0%
Failed
Tokens
267
Input
7
Output
blame_forensics f006
100%
Passed
Tokens
230
Input
4
Output
blame_forensics f007
100%
Passed
Tokens
217
Input
6
Output
blame_forensics f008
100%
Passed
Tokens
249
Input
5
Output
blame_forensics f009
100%
Passed
Tokens
202
Input
6
Output
blame_forensics f010
0%
Failed
Tokens
428
Input
5
Output
blame_forensics f011
100%
Passed
Tokens
245
Input
4
Output
blame_forensics f012
100%
Passed
Tokens
146
Input
7
Output
branch_cleanup f001
100%
Passed
Tokens
107
Input
5
Output
branch_cleanup f002
100%
Passed
Tokens
120
Input
6
Output
branch_cleanup f003
100%
Passed
Tokens
90
Input
2
Output
branch_cleanup f004
100%
Passed
Tokens
127
Input
6
Output
branch_cleanup f005
100%
Passed
Tokens
106
Input
4
Output
branch_cleanup f006
100%
Passed
Tokens
130
Input
10
Output
branch_cleanup f007
100%
Passed
Tokens
107
Input
6
Output
branch_cleanup f008
100%
Passed
Tokens
122
Input
11
Output
branch_cleanup f009
100%
Passed
Tokens
79
Input
2
Output
branch_cleanup f010
100%
Passed
Tokens
116
Input
6
Output
branch_cleanup f011
100%
Passed
Tokens
69
Input
2
Output
branch_cleanup f012
100%
Passed
Tokens
162
Input
12
Output
cherry_pick f001
100%
Passed
Tokens
95
Input
5
Output
cherry_pick f002
100%
Passed
Tokens
122
Input
9
Output
cherry_pick f003
100%
Passed
Tokens
91
Input
5
Output
cherry_pick f004
0%
Failed
Tokens
165
Input
54
Output
cherry_pick f005
0%
Failed
Tokens
131
Input
49
Output
cherry_pick f006
100%
Passed
Tokens
144
Input
27
Output
cherry_pick f007
0%
Failed
Tokens
140
Input
30
Output
cherry_pick f008
100%
Passed
Tokens
124
Input
7
Output
cherry_pick f009
100%
Passed
Tokens
109
Input
6
Output
cherry_pick f010
0%
Failed
Tokens
224
Input
29
Output
cherry_pick f011
100%
Passed
Tokens
165
Input
26
Output
cherry_pick f012
0%
Failed
Tokens
275
Input
48
Output
commit_messages f001
93.3%
Passed
Tokens
109
Input
5
Output
commit_messages f002
87.7%
Passed
Tokens
226
Input
11
Output
commit_messages f003
93.3%
Passed
Tokens
81
Input
10
Output
commit_messages f004
96%
Passed
Tokens
118
Input
4
Output
commit_messages f005
90%
Passed
Tokens
103
Input
5
Output
commit_messages f006
92.7%
Passed
Tokens
149
Input
7
Output
commit_messages f007
91%
Passed
Tokens
191
Input
7
Output
commit_messages f008
88.3%
Passed
Tokens
74
Input
7
Output
commit_messages f009
88.7%
Passed
Tokens
130
Input
10
Output
commit_messages f010
88.3%
Passed
Tokens
298
Input
10
Output
commit_messages f011
78.3%
Passed
Tokens
142
Input
16
Output
commit_messages f012
89%
Passed
Tokens
121
Input
8
Output
commit_squash f001
100%
Passed
Tokens
154
Input
14
Output
commit_squash f002
100%
Passed
Tokens
114
Input
8
Output
commit_squash f003
100%
Passed
Tokens
121
Input
184
Output
commit_squash f004
100%
Passed
Tokens
118
Input
14
Output
commit_squash f005
100%
Passed
Tokens
103
Input
191
Output
commit_squash f006
100%
Passed
Tokens
90
Input
93
Output
commit_squash f007
100%
Passed
Tokens
93
Input
147
Output
commit_squash f008
100%
Failed
Tokens
108
Input
113
Output
commit_squash f009
100%
Failed
Tokens
97
Input
188
Output
commit_squash f010
100%
Failed
Tokens
112
Input
109
Output
commit_squash f011
100%
Failed
Tokens
123
Input
144
Output
commit_squash f012
100%
Passed
Tokens
122
Input
157
Output
git_bisect f001
100%
Passed
Tokens
183
Input
82
Output
git_bisect f002
100%
Passed
Tokens
209
Input
102
Output
git_bisect f003
100%
Passed
Tokens
209
Input
68
Output
git_bisect f004
100%
Passed
Tokens
215
Input
60
Output
git_bisect f005
100%
Passed
Tokens
207
Input
88
Output
git_bisect f006
100%
Passed
Tokens
235
Input
67
Output
git_bisect f007
100%
Passed
Tokens
213
Input
98
Output
git_bisect f008
100%
Passed
Tokens
203
Input
78
Output
git_bisect f009
100%
Passed
Tokens
209
Input
68
Output
git_bisect f010
100%
Passed
Tokens
211
Input
57
Output
git_bisect f011
100%
Passed
Tokens
215
Input
58
Output
git_bisect f012
100%
Passed
Tokens
239
Input
95
Output
git_clean f001
50%
Failed
Tokens
48
Input
14
Output
git_clean f002
100%
Passed
Tokens
58
Input
10
Output
git_clean f003
33.3%
Failed
Tokens
54
Input
10
Output
git_clean f004
33.3%
Failed
Tokens
56
Input
11
Output
git_clean f005
50%
Failed
Tokens
46
Input
10
Output
git_clean f006
75%
Failed
Tokens
73
Input
14
Output
git_clean f007
50%
Failed
Tokens
68
Input
15
Output
git_clean f008
25%
Failed
Tokens
72
Input
10
Output
git_clean f009
75%
Failed
Tokens
63
Input
12
Output
git_clean f010
100%
Passed
Tokens
60
Input
10
Output
git_clean f011
40%
Failed
Tokens
59
Input
11
Output
git_clean f012
50%
Failed
Tokens
62
Input
11
Output
git_grep f001
100%
Passed
Tokens
49
Input
4
Output
git_grep f002
100%
Passed
Tokens
63
Input
2
Output
git_grep f003
100%
Passed
Tokens
82
Input
2
Output
git_grep f004
100%
Passed
Tokens
72
Input
6
Output
git_grep f005
0%
Failed
Tokens
106
Input
2
Output
git_grep f006
100%
Passed
Tokens
36
Input
2
Output
git_grep f007
0%
Failed
Tokens
156
Input
3
Output
git_grep f008
100%
Passed
Tokens
55
Input
2
Output
git_grep f009
100%
Passed
Tokens
49
Input
4
Output
git_grep f010
100%
Passed
Tokens
59
Input
2
Output
git_grep f011
0%
Failed
Tokens
104
Input
2
Output
git_grep f012
100%
Passed
Tokens
60
Input
9
Output
git_log_format f001
100%
Passed
Tokens
914
Input
2
Output
git_log_format f002
100%
Passed
Tokens
883
Input
8
Output
git_log_format f003
100%
Passed
Tokens
726
Input
10
Output
git_log_format f004
100%
Passed
Tokens
888
Input
2
Output
git_log_format f005
0%
Failed
Tokens
735
Input
6
Output
git_log_format f006
0%
Failed
Tokens
914
Input
2
Output
git_log_format f007
100%
Passed
Tokens
921
Input
8
Output
git_log_format f008
100%
Passed
Tokens
1,240
Input
2
Output
git_log_format f009
100%
Passed
Tokens
732
Input
2
Output
git_log_format f010
100%
Passed
Tokens
1,386
Input
2
Output
git_log_format f011
100%
Passed
Tokens
448
Input
3
Output
git_log_format f012
100%
Passed
Tokens
397
Input
2
Output
git_show f001
100%
Passed
Tokens
195
Input
5
Output
git_show f002
0%
Failed
Tokens
193
Input
106
Output
git_show f003
100%
Passed
Tokens
680
Input
7
Output
git_show f004
100%
Passed
Tokens
463
Input
7
Output
git_show f005
100%
Passed
Tokens
196
Input
3
Output
git_show f006
100%
Passed
Tokens
207
Input
2
Output
git_show f007
100%
Passed
Tokens
192
Input
5
Output
git_show f008
100%
Passed
Tokens
204
Input
36
Output
git_show f009
100%
Passed
Tokens
319
Input
2
Output
git_show f010
100%
Passed
Tokens
187
Input
2
Output
git_show f011
100%
Passed
Tokens
222
Input
3
Output
git_show f012
100%
Passed
Tokens
332
Input
2
Output
merge_conflicts f001
100%
Passed
Tokens
85
Input
5
Output
merge_conflicts f002
100%
Passed
Tokens
109
Input
9
Output
merge_conflicts f003
100%
Passed
Tokens
79
Input
5
Output
merge_conflicts f004
100%
Passed
Tokens
150
Input
26
Output
merge_conflicts f005
0%
Failed
Tokens
117
Input
23
Output
merge_conflicts f006
100%
Passed
Tokens
130
Input
27
Output
merge_conflicts f007
0%
Failed
Tokens
140
Input
31
Output
merge_conflicts f008
0%
Failed
Tokens
106
Input
55
Output
merge_conflicts f009
100%
Passed
Tokens
106
Input
6
Output
merge_conflicts f010
0%
Failed
Tokens
189
Input
45
Output
merge_conflicts f011
100%
Passed
Tokens
143
Input
26
Output
merge_conflicts f012
0%
Failed
Tokens
238
Input
48
Output
rebase f001
100%
Passed
Tokens
98
Input
5
Output
rebase f002
100%
Passed
Tokens
131
Input
9
Output
rebase f003
100%
Passed
Tokens
104
Input
4
Output
rebase f004
0%
Failed
Tokens
163
Input
117
Output
rebase f005
0%
Failed
Tokens
130
Input
23
Output
rebase f006
100%
Passed
Tokens
145
Input
27
Output
rebase f007
0%
Failed
Tokens
144
Input
30
Output
rebase f008
0%
Failed
Tokens
121
Input
32
Output
rebase f009
100%
Passed
Tokens
106
Input
6
Output
rebase f010
0%
Failed
Tokens
214
Input
38
Output
rebase f011
100%
Passed
Tokens
154
Input
26
Output
rebase f012
0%
Failed
Tokens
271
Input
48
Output
reflog f001
100%
Passed
Tokens
245
Input
36
Output
reflog f002
100%
Passed
Tokens
247
Input
318
Output
reflog f003
100%
Passed
Tokens
211
Input
129
Output
reflog f004
100%
Passed
Tokens
348
Input
312
Output
reflog f005
100%
Passed
Tokens
403
Input
85
Output
reflog f006
100%
Passed
Tokens
260
Input
182
Output
reflog f007
100%
Passed
Tokens
293
Input
211
Output
reflog f008
100%
Passed
Tokens
387
Input
147
Output
reflog f009
100%
Passed
Tokens
348
Input
242
Output
reflog f010
100%
Passed
Tokens
289
Input
257
Output
reflog f011
100%
Passed
Tokens
350
Input
296
Output
reflog f012
100%
Passed
Tokens
339
Input
255
Output
stash_recovery f001
100%
Passed
Tokens
138
Input
67
Output
stash_recovery f002
100%
Passed
Tokens
306
Input
119
Output
stash_recovery f003
100%
Passed
Tokens
123
Input
137
Output
stash_recovery f004
100%
Passed
Tokens
119
Input
80
Output
stash_recovery f005
100%
Passed
Tokens
130
Input
60
Output
stash_recovery f006
100%
Passed
Tokens
182
Input
83
Output
stash_recovery f007
100%
Passed
Tokens
214
Input
96
Output
stash_recovery f008
100%
Passed
Tokens
181
Input
171
Output
stash_recovery f009
100%
Passed
Tokens
216
Input
98
Output
stash_recovery f010
100%
Passed
Tokens
128
Input
78
Output
stash_recovery f011
100%
Passed
Tokens
122
Input
64
Output
stash_recovery f012
100%
Passed
Tokens
137
Input
77
Output
submodule_usage f001
0%
Failed
Tokens
39
Input
15
Output
submodule_usage f002
0%
Failed
Tokens
98
Input
15
Output
submodule_usage f003
0%
Failed
Tokens
102
Input
59
Output
submodule_usage f004
0%
Failed
Tokens
88
Input
10
Output
submodule_usage f005
0%
Failed
Tokens
46
Input
37
Output
submodule_usage f006
0%
Failed
Tokens
87
Input
10
Output
submodule_usage f007
0%
Failed
Tokens
47
Input
25
Output
submodule_usage f008
33.3%
Failed
Tokens
93
Input
14
Output
submodule_usage f009
0%
Failed
Tokens
91
Input
10
Output
submodule_usage f010
0%
Failed
Tokens
99
Input
13
Output
submodule_usage f011
100%
Passed
Tokens
101
Input
5
Output
submodule_usage f012
0%
Failed
Tokens
46
Input
18
Output
tag_management f001
0%
Failed
Tokens
55
Input
12
Output
tag_management f002
0%
Failed
Tokens
53
Input
24
Output
tag_management f003
50%
Failed
Tokens
60
Input
15
Output
tag_management f004
100%
Passed
Tokens
68
Input
8
Output
tag_management f005
100%
Passed
Tokens
79
Input
13
Output
tag_management f006
0%
Failed
Tokens
62
Input
19
Output
tag_management f007
100%
Passed
Tokens
60
Input
33
Output
tag_management f008
33.3%
Failed
Tokens
66
Input
31
Output
tag_management f009
0%
Failed
Tokens
68
Input
23
Output
tag_management f010
0%
Failed
Tokens
46
Input
13
Output
tag_management f011
0%
Failed
Tokens
45
Input
11
Output
tag_management f012
100%
Passed
Tokens
86
Input
15
Output
worktree_usage f001
25%
Failed
Tokens
145
Input
15
Output
worktree_usage f002
0%
Failed
Tokens
149
Input
28
Output
worktree_usage f003
0%
Failed
Tokens
230
Input
14
Output
worktree_usage f004
0%
Failed
Tokens
234
Input
10
Output
worktree_usage f005
33.3%
Failed
Tokens
139
Input
19
Output
worktree_usage f006
0%
Failed
Tokens
145
Input
21
Output
worktree_usage f007
0%
Failed
Tokens
148
Input
25
Output
worktree_usage f008
0%
Failed
Tokens
164
Input
54
Output
worktree_usage f009
0%
Failed
Tokens
268
Input
11
Output
worktree_usage f010
0%
Failed
Tokens
237
Input
21
Output
worktree_usage f011
0%
Failed
Tokens
233
Input
14
Output
worktree_usage f012
0%
Failed
Tokens
142
Input
19
Output