Nvidia / nemotron-3-nano-30b-a3b
none
65.2%
133 / 204 fixtures
1 run(s)
42,553 input / 47,931 total output / 0 reasoning within output tokens $0.01171861
Reliability by Benchmark (Text)
Loading reliability summary…
Text vs JSON Schema Comparison
Pass Rate Delta
+6.4%
Text: 65.2% →
JSON: 71.6%
+32
Gained
JSON pass / text fail
−19
Lost
Text pass / JSON fail
114
Unchanged Pass
Both pass
39
Unchanged Fail
Both fail
Fixture Reliability Delta
Benchmark Deltas
| Benchmark | Text | JSON | Delta |
|---|---|---|---|
| commit_squash | 33.3% | 100% | + 66.7% |
| git_bisect | 58.3% | 91.7% | + 33.3% |
| git_clean | 91.7% | 66.7% | -25% |
| cherry_pick | 33.3% | 50% | + 16.7% |
| commit_messages | 83.3% | 100% | + 16.7% |
| worktree_usage | 50% | 66.7% | + 16.7% |
| reflog | 83.3% | 66.7% | -16.7% |
| git_grep | 83.3% | 91.7% | + 8.3% |
| branch_cleanup | 58.3% | 50% | -8.3% |
| git_log_format | 66.7% | 75% | + 8.3% |
| stash_recovery | 100% | 91.7% | -8.3% |
| submodule_usage | 58.3% | 50% | -8.3% |
| tag_management | 75% | 83.3% | + 8.3% |
| blame_forensics | 50% | 50% | + 0% |
| git_show | 83.3% | 83.3% | + 0% |
| merge_conflicts | 50% | 50% | + 0% |
| rebase | 50% | 50% | + 0% |
Changed Fixtures (51)
f001 -loss f004 +gain f008 -loss f009 +gain f009 -loss f010 +gain f011 -loss f004 +gain f009 +gain f005 +gain f006 +gain f003 +gain f005 +gain f006 +gain f007 +gain f009 +gain f010 +gain f011 +gain f012 +gain f002 +gain f004 +gain f005 +gain f007 +gain f004 -loss f006 -loss f007 -loss f011 +gain f007 +gain f002 -loss f007 -loss f009 +gain f011 +gain f005 +gain f007 -loss f008 +gain f011 -loss f001 +gain f006 -loss f009 -loss f010 -loss f008 -loss f001 +gain f006 -loss f012 -loss f008 -loss f009 +gain f010 +gain f003 +gain f007 +gain f009 +gain f011 -loss
Fixture Gallery (204)
blame_forensics f001
100%
Passed
Tokens
260
Input
7
Total output
0
Reasoning within output
blame_forensics f002
0%
Failed
Tokens
255
Input
12
Total output
0
Reasoning within output
blame_forensics f003
0%
Failed
Tokens
293
Input
16
Total output
0
Reasoning within output
blame_forensics f004
0%
Failed
Tokens
207
Input
13
Total output
0
Reasoning within output
blame_forensics f005
0%
Failed
Tokens
281
Input
13
Total output
0
Reasoning within output
blame_forensics f006
100%
Passed
Tokens
246
Input
4
Total output
0
Reasoning within output
blame_forensics f007
100%
Passed
Tokens
226
Input
6
Total output
0
Reasoning within output
blame_forensics f008
100%
Passed
Tokens
259
Input
5
Total output
0
Reasoning within output
blame_forensics f009
0%
Failed
Tokens
211
Input
12
Total output
0
Reasoning within output
blame_forensics f010
0%
Failed
Tokens
452
Input
4
Total output
0
Reasoning within output
blame_forensics f011
100%
Passed
Tokens
258
Input
4
Total output
0
Reasoning within output
blame_forensics f012
100%
Passed
Tokens
159
Input
7
Total output
0
Reasoning within output
branch_cleanup f001
100%
Failed
Tokens
119
Input
7
Total output
0
Reasoning within output
branch_cleanup f002
100%
Failed
Tokens
133
Input
8
Total output
0
Reasoning within output
branch_cleanup f003
0%
Failed
Tokens
110
Input
6
Total output
0
Reasoning within output
branch_cleanup f004
0%
Failed
Tokens
147
Input
2
Total output
0
Reasoning within output
branch_cleanup f005
100%
Passed
Tokens
113
Input
4
Total output
0
Reasoning within output
branch_cleanup f006
100%
Passed
Tokens
143
Input
10
Total output
0
Reasoning within output
branch_cleanup f007
100%
Passed
Tokens
117
Input
6
Total output
0
Reasoning within output
branch_cleanup f008
100%
Passed
Tokens
139
Input
11
Total output
0
Reasoning within output
branch_cleanup f009
100%
Passed
Tokens
94
Input
2
Total output
0
Reasoning within output
branch_cleanup f010
0%
Failed
Tokens
131
Input
2
Total output
0
Reasoning within output
branch_cleanup f011
100%
Passed
Tokens
81
Input
2
Total output
0
Reasoning within output
branch_cleanup f012
100%
Passed
Tokens
175
Input
12
Total output
0
Reasoning within output
cherry_pick f001
0%
Failed
Tokens
109
Input
5
Total output
0
Reasoning within output
cherry_pick f002
100%
Passed
Tokens
136
Input
9
Total output
0
Reasoning within output
cherry_pick f003
100%
Passed
Tokens
103
Input
5
Total output
0
Reasoning within output
cherry_pick f004
0%
Failed
Tokens
178
Input
107
Total output
0
Reasoning within output
cherry_pick f005
0%
Failed
Tokens
143
Input
18
Total output
0
Reasoning within output
cherry_pick f006
0%
Failed
Tokens
158
Input
27
Total output
0
Reasoning within output
cherry_pick f007
100%
Passed
Tokens
153
Input
12
Total output
0
Reasoning within output
cherry_pick f008
100%
Passed
Tokens
136
Input
7
Total output
0
Reasoning within output
cherry_pick f009
0%
Failed
Tokens
122
Input
8
Total output
0
Reasoning within output
cherry_pick f010
0%
Failed
Tokens
239
Input
29
Total output
0
Reasoning within output
cherry_pick f011
0%
Failed
Tokens
177
Input
22
Total output
0
Reasoning within output
cherry_pick f012
0%
Failed
Tokens
288
Input
19
Total output
0
Reasoning within output
commit_messages f001
95.3%
Passed
Tokens
122
Input
4
Total output
0
Reasoning within output
commit_messages f002
87.7%
Passed
Tokens
239
Input
7
Total output
0
Reasoning within output
commit_messages f003
83.3%
Passed
Tokens
94
Input
12
Total output
0
Reasoning within output
commit_messages f004
87.7%
Passed
Tokens
131
Input
6
Total output
0
Reasoning within output
commit_messages f005
16.7%
Failed
Tokens
116
Input
6
Total output
0
Reasoning within output
commit_messages f006
3.3%
Failed
Tokens
162
Input
32,768
Total output
0
Reasoning within output
commit_messages f007
91.3%
Passed
Tokens
204
Input
7
Total output
0
Reasoning within output
commit_messages f008
87%
Passed
Tokens
87
Input
5
Total output
0
Reasoning within output
commit_messages f009
85%
Passed
Tokens
143
Input
10
Total output
0
Reasoning within output
commit_messages f010
89.7%
Passed
Tokens
311
Input
11
Total output
0
Reasoning within output
commit_messages f011
64%
Passed
Tokens
155
Input
11
Total output
0
Reasoning within output
commit_messages f012
71.7%
Passed
Tokens
134
Input
9
Total output
0
Reasoning within output
commit_squash f001
100%
Passed
Tokens
167
Input
37
Total output
0
Reasoning within output
commit_squash f002
100%
Passed
Tokens
128
Input
14
Total output
0
Reasoning within output
commit_squash f003
100%
Failed
Tokens
137
Input
45
Total output
0
Reasoning within output
commit_squash f004
100%
Passed
Tokens
131
Input
22
Total output
0
Reasoning within output
commit_squash f005
100%
Failed
Tokens
115
Input
384
Total output
0
Reasoning within output
commit_squash f006
100%
Failed
Tokens
103
Input
422
Total output
0
Reasoning within output
commit_squash f007
100%
Failed
Tokens
107
Input
275
Total output
0
Reasoning within output
commit_squash f008
100%
Passed
Tokens
119
Input
48
Total output
0
Reasoning within output
commit_squash f009
100%
Failed
Tokens
108
Input
178
Total output
0
Reasoning within output
commit_squash f010
100%
Failed
Tokens
123
Input
258
Total output
0
Reasoning within output
commit_squash f011
100%
Failed
Tokens
134
Input
655
Total output
0
Reasoning within output
commit_squash f012
100%
Failed
Tokens
134
Input
440
Total output
0
Reasoning within output
git_bisect f001
100%
Passed
Tokens
194
Input
6
Total output
0
Reasoning within output
git_bisect f002
0%
Failed
Tokens
222
Input
9
Total output
0
Reasoning within output
git_bisect f003
100%
Passed
Tokens
216
Input
8
Total output
0
Reasoning within output
git_bisect f004
0%
Failed
Tokens
228
Input
9
Total output
0
Reasoning within output
git_bisect f005
0%
Failed
Tokens
222
Input
170
Total output
0
Reasoning within output
git_bisect f006
100%
Passed
Tokens
252
Input
7
Total output
0
Reasoning within output
git_bisect f007
0%
Failed
Tokens
222
Input
7
Total output
0
Reasoning within output
git_bisect f008
100%
Passed
Tokens
228
Input
33
Total output
0
Reasoning within output
git_bisect f009
100%
Passed
Tokens
226
Input
10
Total output
0
Reasoning within output
git_bisect f010
100%
Passed
Tokens
218
Input
10
Total output
0
Reasoning within output
git_bisect f011
0%
Failed
Tokens
220
Input
6
Total output
0
Reasoning within output
git_bisect f012
100%
Passed
Tokens
250
Input
62
Total output
0
Reasoning within output
git_clean f001
100%
Passed
Tokens
61
Input
9
Total output
0
Reasoning within output
git_clean f002
100%
Passed
Tokens
71
Input
5
Total output
0
Reasoning within output
git_clean f003
100%
Passed
Tokens
67
Input
6
Total output
0
Reasoning within output
git_clean f004
100%
Passed
Tokens
69
Input
14
Total output
0
Reasoning within output
git_clean f005
50%
Failed
Tokens
59
Input
1,884
Total output
0
Reasoning within output
git_clean f006
100%
Passed
Tokens
86
Input
8
Total output
0
Reasoning within output
git_clean f007
100%
Passed
Tokens
81
Input
20
Total output
0
Reasoning within output
git_clean f008
100%
Passed
Tokens
85
Input
6
Total output
0
Reasoning within output
git_clean f009
100%
Passed
Tokens
76
Input
9
Total output
0
Reasoning within output
git_clean f010
100%
Passed
Tokens
73
Input
11
Total output
0
Reasoning within output
git_clean f011
100%
Passed
Tokens
72
Input
6
Total output
0
Reasoning within output
git_clean f012
100%
Passed
Tokens
75
Input
8
Total output
0
Reasoning within output
git_grep f001
100%
Passed
Tokens
62
Input
4
Total output
0
Reasoning within output
git_grep f002
100%
Passed
Tokens
77
Input
2
Total output
0
Reasoning within output
git_grep f003
100%
Passed
Tokens
95
Input
2
Total output
0
Reasoning within output
git_grep f004
100%
Passed
Tokens
85
Input
6
Total output
0
Reasoning within output
git_grep f005
100%
Passed
Tokens
119
Input
2
Total output
0
Reasoning within output
git_grep f006
100%
Passed
Tokens
49
Input
2
Total output
0
Reasoning within output
git_grep f007
0%
Failed
Tokens
169
Input
2
Total output
0
Reasoning within output
git_grep f008
100%
Passed
Tokens
68
Input
2
Total output
0
Reasoning within output
git_grep f009
100%
Passed
Tokens
62
Input
4
Total output
0
Reasoning within output
git_grep f010
100%
Passed
Tokens
72
Input
2
Total output
0
Reasoning within output
git_grep f011
0%
Failed
Tokens
117
Input
2
Total output
0
Reasoning within output
git_grep f012
100%
Passed
Tokens
73
Input
9
Total output
0
Reasoning within output
git_log_format f001
100%
Passed
Tokens
929
Input
2
Total output
0
Reasoning within output
git_log_format f002
100%
Passed
Tokens
905
Input
8
Total output
0
Reasoning within output
git_log_format f003
100%
Passed
Tokens
746
Input
10
Total output
0
Reasoning within output
git_log_format f004
0%
Failed
Tokens
914
Input
3
Total output
0
Reasoning within output
git_log_format f005
0%
Failed
Tokens
748
Input
112
Total output
0
Reasoning within output
git_log_format f006
0%
Failed
Tokens
927
Input
2
Total output
0
Reasoning within output
git_log_format f007
0%
Failed
Tokens
940
Input
39
Total output
0
Reasoning within output
git_log_format f008
100%
Passed
Tokens
1,219
Input
2
Total output
0
Reasoning within output
git_log_format f009
100%
Passed
Tokens
744
Input
2
Total output
0
Reasoning within output
git_log_format f010
100%
Passed
Tokens
1,423
Input
2
Total output
0
Reasoning within output
git_log_format f011
100%
Passed
Tokens
458
Input
3
Total output
0
Reasoning within output
git_log_format f012
100%
Passed
Tokens
415
Input
2
Total output
0
Reasoning within output
git_show f001
100%
Passed
Tokens
207
Input
5
Total output
0
Reasoning within output
git_show f002
0%
Failed
Tokens
206
Input
3
Total output
0
Reasoning within output
git_show f003
100%
Passed
Tokens
697
Input
7
Total output
0
Reasoning within output
git_show f004
100%
Passed
Tokens
472
Input
7
Total output
0
Reasoning within output
git_show f005
0%
Failed
Tokens
209
Input
3
Total output
0
Reasoning within output
git_show f006
100%
Passed
Tokens
220
Input
2
Total output
0
Reasoning within output
git_show f007
100%
Passed
Tokens
205
Input
5
Total output
0
Reasoning within output
git_show f008
100%
Passed
Tokens
217
Input
44
Total output
0
Reasoning within output
git_show f009
100%
Passed
Tokens
321
Input
2
Total output
0
Reasoning within output
git_show f010
100%
Passed
Tokens
204
Input
2
Total output
0
Reasoning within output
git_show f011
100%
Passed
Tokens
235
Input
3
Total output
0
Reasoning within output
git_show f012
100%
Passed
Tokens
342
Input
2
Total output
0
Reasoning within output
merge_conflicts f001
0%
Failed
Tokens
98
Input
5
Total output
0
Reasoning within output
merge_conflicts f002
100%
Passed
Tokens
122
Input
9
Total output
0
Reasoning within output
merge_conflicts f003
100%
Passed
Tokens
92
Input
5
Total output
0
Reasoning within output
merge_conflicts f004
100%
Passed
Tokens
163
Input
26
Total output
0
Reasoning within output
merge_conflicts f005
100%
Passed
Tokens
130
Input
17
Total output
0
Reasoning within output
merge_conflicts f006
0%
Failed
Tokens
143
Input
19
Total output
0
Reasoning within output
merge_conflicts f007
100%
Passed
Tokens
153
Input
12
Total output
0
Reasoning within output
merge_conflicts f008
100%
Passed
Tokens
119
Input
7
Total output
0
Reasoning within output
merge_conflicts f009
0%
Failed
Tokens
119
Input
5
Total output
0
Reasoning within output
merge_conflicts f010
0%
Failed
Tokens
202
Input
29
Total output
0
Reasoning within output
merge_conflicts f011
0%
Failed
Tokens
156
Input
21
Total output
0
Reasoning within output
merge_conflicts f012
0%
Failed
Tokens
251
Input
21
Total output
0
Reasoning within output
rebase f001
0%
Failed
Tokens
111
Input
5
Total output
0
Reasoning within output
rebase f002
100%
Passed
Tokens
142
Input
9
Total output
0
Reasoning within output
rebase f003
100%
Passed
Tokens
117
Input
4
Total output
0
Reasoning within output
rebase f004
100%
Passed
Tokens
175
Input
26
Total output
0
Reasoning within output
rebase f005
0%
Failed
Tokens
143
Input
18
Total output
0
Reasoning within output
rebase f006
0%
Failed
Tokens
156
Input
27
Total output
0
Reasoning within output
rebase f007
100%
Passed
Tokens
157
Input
12
Total output
0
Reasoning within output
rebase f008
0%
Failed
Tokens
132
Input
89
Total output
0
Reasoning within output
rebase f009
100%
Passed
Tokens
119
Input
6
Total output
0
Reasoning within output
rebase f010
0%
Failed
Tokens
229
Input
29
Total output
0
Reasoning within output
rebase f011
100%
Passed
Tokens
168
Input
22
Total output
0
Reasoning within output
rebase f012
0%
Failed
Tokens
280
Input
19
Total output
0
Reasoning within output
reflog f001
0%
Failed
Tokens
264
Input
37
Total output
0
Reasoning within output
reflog f002
100%
Passed
Tokens
259
Input
170
Total output
0
Reasoning within output
reflog f003
100%
Passed
Tokens
227
Input
253
Total output
0
Reasoning within output
reflog f004
100%
Passed
Tokens
355
Input
679
Total output
0
Reasoning within output
reflog f005
100%
Passed
Tokens
410
Input
790
Total output
0
Reasoning within output
reflog f006
100%
Passed
Tokens
262
Input
429
Total output
0
Reasoning within output
reflog f007
100%
Passed
Tokens
316
Input
231
Total output
0
Reasoning within output
reflog f008
0%
Failed
Tokens
396
Input
26
Total output
0
Reasoning within output
reflog f009
100%
Passed
Tokens
351
Input
1,101
Total output
0
Reasoning within output
reflog f010
100%
Passed
Tokens
309
Input
1,106
Total output
0
Reasoning within output
reflog f011
100%
Passed
Tokens
358
Input
461
Total output
0
Reasoning within output
reflog f012
100%
Passed
Tokens
354
Input
1,267
Total output
0
Reasoning within output
stash_recovery f001
100%
Passed
Tokens
148
Input
27
Total output
0
Reasoning within output
stash_recovery f002
100%
Passed
Tokens
316
Input
68
Total output
0
Reasoning within output
stash_recovery f003
100%
Passed
Tokens
136
Input
52
Total output
0
Reasoning within output
stash_recovery f004
100%
Passed
Tokens
132
Input
199
Total output
0
Reasoning within output
stash_recovery f005
100%
Passed
Tokens
145
Input
51
Total output
0
Reasoning within output
stash_recovery f006
100%
Passed
Tokens
195
Input
196
Total output
0
Reasoning within output
stash_recovery f007
100%
Passed
Tokens
229
Input
53
Total output
0
Reasoning within output
stash_recovery f008
100%
Passed
Tokens
194
Input
152
Total output
0
Reasoning within output
stash_recovery f009
100%
Passed
Tokens
229
Input
142
Total output
0
Reasoning within output
stash_recovery f010
100%
Passed
Tokens
141
Input
59
Total output
0
Reasoning within output
stash_recovery f011
100%
Passed
Tokens
135
Input
220
Total output
0
Reasoning within output
stash_recovery f012
100%
Passed
Tokens
150
Input
85
Total output
0
Reasoning within output
submodule_usage f001
0%
Failed
Tokens
52
Input
19
Total output
0
Reasoning within output
submodule_usage f002
100%
Passed
Tokens
115
Input
10
Total output
0
Reasoning within output
submodule_usage f003
75%
Failed
Tokens
117
Input
64
Total output
0
Reasoning within output
submodule_usage f004
0%
Failed
Tokens
100
Input
7
Total output
0
Reasoning within output
submodule_usage f005
66.7%
Failed
Tokens
59
Input
10
Total output
0
Reasoning within output
submodule_usage f006
100%
Passed
Tokens
100
Input
5
Total output
0
Reasoning within output
submodule_usage f007
100%
Passed
Tokens
60
Input
25
Total output
0
Reasoning within output
submodule_usage f008
33.3%
Failed
Tokens
109
Input
29
Total output
0
Reasoning within output
submodule_usage f009
100%
Passed
Tokens
103
Input
11
Total output
0
Reasoning within output
submodule_usage f010
100%
Passed
Tokens
114
Input
8
Total output
0
Reasoning within output
submodule_usage f011
100%
Passed
Tokens
115
Input
5
Total output
0
Reasoning within output
submodule_usage f012
100%
Passed
Tokens
59
Input
13
Total output
0
Reasoning within output
tag_management f001
100%
Passed
Tokens
66
Input
9
Total output
0
Reasoning within output
tag_management f002
100%
Passed
Tokens
64
Input
19
Total output
0
Reasoning within output
tag_management f003
100%
Passed
Tokens
73
Input
10
Total output
0
Reasoning within output
tag_management f004
100%
Passed
Tokens
78
Input
5
Total output
0
Reasoning within output
tag_management f005
100%
Passed
Tokens
92
Input
8
Total output
0
Reasoning within output
tag_management f006
100%
Passed
Tokens
74
Input
14
Total output
0
Reasoning within output
tag_management f007
100%
Passed
Tokens
72
Input
33
Total output
0
Reasoning within output
tag_management f008
100%
Passed
Tokens
79
Input
48
Total output
0
Reasoning within output
tag_management f009
0%
Failed
Tokens
81
Input
29
Total output
0
Reasoning within output
tag_management f010
0%
Failed
Tokens
59
Input
15
Total output
0
Reasoning within output
tag_management f011
50%
Failed
Tokens
60
Input
31
Total output
0
Reasoning within output
tag_management f012
100%
Passed
Tokens
100
Input
35
Total output
0
Reasoning within output
worktree_usage f001
25%
Failed
Tokens
160
Input
249
Total output
0
Reasoning within output
worktree_usage f002
100%
Passed
Tokens
165
Input
23
Total output
0
Reasoning within output
worktree_usage f003
0%
Failed
Tokens
248
Input
6
Total output
0
Reasoning within output
worktree_usage f004
100%
Passed
Tokens
245
Input
5
Total output
0
Reasoning within output
worktree_usage f005
33.3%
Failed
Tokens
156
Input
15
Total output
0
Reasoning within output
worktree_usage f006
100%
Passed
Tokens
154
Input
13
Total output
0
Reasoning within output
worktree_usage f007
0%
Failed
Tokens
159
Input
22
Total output
0
Reasoning within output
worktree_usage f008
0%
Failed
Tokens
176
Input
56
Total output
0
Reasoning within output
worktree_usage f009
0%
Failed
Tokens
277
Input
28
Total output
0
Reasoning within output
worktree_usage f010
100%
Passed
Tokens
248
Input
16
Total output
0
Reasoning within output
worktree_usage f011
100%
Passed
Tokens
243
Input
10
Total output
0
Reasoning within output
worktree_usage f012
100%
Passed
Tokens
154
Input
19
Total output
0
Reasoning within output