Nvidia / nemotron-3-nano-30b-a3b
high
86.8%
177 / 204 fixtures
1 run(s)
42,451 input / 291,706 total output / 89,059 reasoning within output tokens $0.06158866
Reliability by Benchmark (Text)
Loading reliability summary…
Text vs JSON Schema Comparison
Pass Rate Delta
-2.5%
Text: 86.8% →
JSON: 84.3%
+8
Gained
JSON pass / text fail
−13
Lost
Text pass / JSON fail
164
Unchanged Pass
Both pass
19
Unchanged Fail
Both fail
Fixture Reliability Delta
| Fixture | Text | JSON | Delta |
|---|---|---|---|
| f010 | 100% (1/1) | 0% (0/1) | +100% |
Benchmark Deltas
| Benchmark | Text | JSON | Delta |
|---|---|---|---|
| reflog | 91.7% | 66.7% | -25% |
| rebase | 75% | 58.3% | -16.7% |
| blame_forensics | 83.3% | 91.7% | + 8.3% |
| commit_messages | 100% | 91.7% | -8.3% |
| commit_squash | 83.3% | 75% | -8.3% |
| git_clean | 91.7% | 100% | + 8.3% |
| merge_conflicts | 58.3% | 50% | -8.3% |
| stash_recovery | 91.7% | 100% | + 8.3% |
| submodule_usage | 75% | 83.3% | + 8.3% |
| cherry_pick | 50% | 41.7% | -8.3% |
| branch_cleanup | 100% | 100% | + 0% |
| git_bisect | 100% | 100% | + 0% |
| git_grep | 100% | 100% | + 0% |
| git_log_format | 100% | 100% | + 0% |
| git_show | 100% | 100% | + 0% |
| tag_management | 91.7% | 91.7% | + 0% |
| worktree_usage | 83.3% | 83.3% | + 0% |
Changed Fixtures (21)
Fixture Gallery (204)
blame_forensics f001
100%
Passed
Tokens
259
Input
105
Total output
109
Reasoning within output
blame_forensics f002
100%
Passed
Tokens
243
Input
2,162
Total output
2,165
Reasoning within output
blame_forensics f003
100%
Passed
Tokens
292
Input
223
Total output
206
Reasoning within output
blame_forensics f004
100%
Passed
Tokens
207
Input
91
Total output
111
Reasoning within output
blame_forensics f005
0%
Failed
Tokens
282
Input
348
Total output
357
Reasoning within output
blame_forensics f006
0%
Failed
Tokens
248
Input
32,768
Total output
121
Reasoning within output
blame_forensics f007
100%
Passed
Tokens
236
Input
255
Total output
200
Reasoning within output
blame_forensics f008
100%
Passed
Tokens
261
Input
424
Total output
446
Reasoning within output
blame_forensics f009
100%
Passed
Tokens
212
Input
155
Total output
170
Reasoning within output
blame_forensics f010
100%
Passed
Tokens
445
Input
223
Total output
227
Reasoning within output
blame_forensics f011
100%
Passed
Tokens
259
Input
158
Total output
140
Reasoning within output
blame_forensics f012
100%
Passed
Tokens
156
Input
203
Total output
197
Reasoning within output
branch_cleanup f001
100%
Passed
Tokens
117
Input
164
Total output
169
Reasoning within output
branch_cleanup f002
100%
Passed
Tokens
132
Input
194
Total output
207
Reasoning within output
branch_cleanup f003
100%
Passed
Tokens
108
Input
378
Total output
414
Reasoning within output
branch_cleanup f004
100%
Passed
Tokens
145
Input
162
Total output
178
Reasoning within output
branch_cleanup f005
100%
Passed
Tokens
119
Input
299
Total output
319
Reasoning within output
branch_cleanup f006
100%
Passed
Tokens
147
Input
131
Total output
127
Reasoning within output
branch_cleanup f007
100%
Passed
Tokens
108
Input
189
Total output
212
Reasoning within output
branch_cleanup f008
100%
Passed
Tokens
139
Input
131
Total output
127
Reasoning within output
branch_cleanup f009
100%
Passed
Tokens
91
Input
79
Total output
90
Reasoning within output
branch_cleanup f010
100%
Passed
Tokens
129
Input
125
Total output
102
Reasoning within output
branch_cleanup f011
100%
Passed
Tokens
81
Input
82
Total output
87
Reasoning within output
branch_cleanup f012
100%
Passed
Tokens
170
Input
169
Total output
173
Reasoning within output
cherry_pick f001
0%
Failed
Tokens
109
Input
1,478
Total output
1,638
Reasoning within output
cherry_pick f002
0%
Failed
Tokens
133
Input
32,768
Total output
81
Reasoning within output
cherry_pick f003
100%
Passed
Tokens
102
Input
420
Total output
478
Reasoning within output
cherry_pick f004
0%
Failed
Tokens
177
Input
563
Total output
533
Reasoning within output
cherry_pick f005
100%
Passed
Tokens
142
Input
325
Total output
358
Reasoning within output
cherry_pick f006
0%
Failed
Tokens
157
Input
1,075
Total output
1,055
Reasoning within output
cherry_pick f007
100%
Passed
Tokens
154
Input
206
Total output
195
Reasoning within output
cherry_pick f008
100%
Passed
Tokens
137
Input
78
Total output
83
Reasoning within output
cherry_pick f009
100%
Passed
Tokens
122
Input
106
Total output
107
Reasoning within output
cherry_pick f010
0%
Failed
Tokens
241
Input
610
Total output
708
Reasoning within output
cherry_pick f011
100%
Passed
Tokens
177
Input
545
Total output
518
Reasoning within output
cherry_pick f012
0%
Failed
Tokens
284
Input
275
Total output
239
Reasoning within output
commit_messages f001
59.3%
Passed
Tokens
122
Input
107
Total output
59
Reasoning within output
commit_messages f002
91.7%
Passed
Tokens
239
Input
432
Total output
322
Reasoning within output
commit_messages f003
92.3%
Passed
Tokens
94
Input
451
Total output
306
Reasoning within output
commit_messages f004
94%
Passed
Tokens
131
Input
445
Total output
324
Reasoning within output
commit_messages f005
91.7%
Passed
Tokens
116
Input
249
Total output
187
Reasoning within output
commit_messages f006
91%
Passed
Tokens
162
Input
204
Total output
178
Reasoning within output
commit_messages f007
91.7%
Passed
Tokens
204
Input
229
Total output
195
Reasoning within output
commit_messages f008
93.3%
Passed
Tokens
87
Input
189
Total output
170
Reasoning within output
commit_messages f009
80%
Passed
Tokens
143
Input
232
Total output
187
Reasoning within output
commit_messages f010
77.7%
Passed
Tokens
311
Input
461
Total output
389
Reasoning within output
commit_messages f011
54.3%
Passed
Tokens
155
Input
1,011
Total output
634
Reasoning within output
commit_messages f012
89%
Passed
Tokens
134
Input
225
Total output
166
Reasoning within output
commit_squash f001
100%
Passed
Tokens
167
Input
464
Total output
434
Reasoning within output
commit_squash f002
100%
Passed
Tokens
129
Input
220
Total output
209
Reasoning within output
commit_squash f003
100%
Passed
Tokens
131
Input
3,009
Total output
2,984
Reasoning within output
commit_squash f004
100%
Passed
Tokens
132
Input
563
Total output
457
Reasoning within output
commit_squash f005
100%
Failed
Tokens
118
Input
566
Total output
422
Reasoning within output
commit_squash f006
0%
Failed
Tokens
102
Input
32,768
Total output
32
Reasoning within output
commit_squash f007
100%
Passed
Tokens
106
Input
185
Total output
142
Reasoning within output
commit_squash f008
100%
Passed
Tokens
119
Input
519
Total output
430
Reasoning within output
commit_squash f009
100%
Passed
Tokens
113
Input
668
Total output
651
Reasoning within output
commit_squash f010
100%
Passed
Tokens
124
Input
243
Total output
202
Reasoning within output
commit_squash f011
100%
Passed
Tokens
132
Input
229
Total output
189
Reasoning within output
commit_squash f012
100%
Passed
Tokens
132
Input
306
Total output
280
Reasoning within output
git_bisect f001
100%
Passed
Tokens
192
Input
125
Total output
123
Reasoning within output
git_bisect f002
100%
Passed
Tokens
228
Input
354
Total output
318
Reasoning within output
git_bisect f003
100%
Passed
Tokens
220
Input
348
Total output
306
Reasoning within output
git_bisect f004
100%
Passed
Tokens
222
Input
672
Total output
671
Reasoning within output
git_bisect f005
100%
Passed
Tokens
224
Input
596
Total output
656
Reasoning within output
git_bisect f006
100%
Passed
Tokens
250
Input
144
Total output
144
Reasoning within output
git_bisect f007
100%
Passed
Tokens
224
Input
168
Total output
166
Reasoning within output
git_bisect f008
100%
Passed
Tokens
226
Input
212
Total output
190
Reasoning within output
git_bisect f009
100%
Passed
Tokens
228
Input
230
Total output
211
Reasoning within output
git_bisect f010
100%
Passed
Tokens
228
Input
366
Total output
325
Reasoning within output
git_bisect f011
100%
Passed
Tokens
222
Input
181
Total output
166
Reasoning within output
git_bisect f012
100%
Passed
Tokens
254
Input
444
Total output
401
Reasoning within output
git_clean f001
100%
Passed
Tokens
61
Input
208
Total output
190
Reasoning within output
git_clean f002
100%
Passed
Tokens
71
Input
184
Total output
193
Reasoning within output
git_clean f003
33.3%
Failed
Tokens
67
Input
32,768
Total output
66
Reasoning within output
git_clean f004
100%
Passed
Tokens
69
Input
688
Total output
748
Reasoning within output
git_clean f005
100%
Passed
Tokens
59
Input
8,761
Total output
8,988
Reasoning within output
git_clean f006
100%
Passed
Tokens
86
Input
243
Total output
249
Reasoning within output
git_clean f007
100%
Passed
Tokens
81
Input
440
Total output
462
Reasoning within output
git_clean f008
100%
Passed
Tokens
85
Input
519
Total output
540
Reasoning within output
git_clean f009
100%
Passed
Tokens
76
Input
305
Total output
334
Reasoning within output
git_clean f010
100%
Passed
Tokens
73
Input
255
Total output
258
Reasoning within output
git_clean f011
100%
Passed
Tokens
72
Input
493
Total output
518
Reasoning within output
git_clean f012
100%
Passed
Tokens
75
Input
313
Total output
316
Reasoning within output
git_grep f001
100%
Passed
Tokens
62
Input
5,230
Total output
6,308
Reasoning within output
git_grep f002
100%
Passed
Tokens
76
Input
75
Total output
79
Reasoning within output
git_grep f003
100%
Passed
Tokens
95
Input
95
Total output
96
Reasoning within output
git_grep f004
100%
Passed
Tokens
85
Input
56
Total output
45
Reasoning within output
git_grep f005
100%
Passed
Tokens
119
Input
163
Total output
166
Reasoning within output
git_grep f006
100%
Passed
Tokens
49
Input
238
Total output
264
Reasoning within output
git_grep f007
100%
Passed
Tokens
169
Input
2,343
Total output
2,350
Reasoning within output
git_grep f008
100%
Passed
Tokens
68
Input
51
Total output
52
Reasoning within output
git_grep f009
100%
Passed
Tokens
62
Input
79
Total output
75
Reasoning within output
git_grep f010
100%
Passed
Tokens
72
Input
574
Total output
618
Reasoning within output
git_grep f011
100%
Passed
Tokens
117
Input
166
Total output
187
Reasoning within output
git_grep f012
100%
Passed
Tokens
73
Input
76
Total output
66
Reasoning within output
git_log_format f001
100%
Passed
Tokens
920
Input
120
Total output
128
Reasoning within output
git_log_format f002
100%
Passed
Tokens
887
Input
161
Total output
190
Reasoning within output
git_log_format f003
100%
Passed
Tokens
733
Input
100
Total output
105
Reasoning within output
git_log_format f004
100%
Passed
Tokens
922
Input
91
Total output
74
Reasoning within output
git_log_format f005
100%
Passed
Tokens
748
Input
632
Total output
519
Reasoning within output
git_log_format f006
100%
Passed
Tokens
927
Input
299
Total output
176
Reasoning within output
git_log_format f007
100%
Passed
Tokens
912
Input
195
Total output
175
Reasoning within output
git_log_format f008
100%
Passed
Tokens
1,232
Input
105
Total output
102
Reasoning within output
git_log_format f009
100%
Passed
Tokens
719
Input
86
Total output
84
Reasoning within output
git_log_format f010
100%
Passed
Tokens
1,372
Input
128
Total output
122
Reasoning within output
git_log_format f011
100%
Passed
Tokens
458
Input
153
Total output
138
Reasoning within output
git_log_format f012
100%
Passed
Tokens
418
Input
211
Total output
166
Reasoning within output
git_show f001
100%
Passed
Tokens
208
Input
67
Total output
69
Reasoning within output
git_show f002
100%
Passed
Tokens
206
Input
653
Total output
634
Reasoning within output
git_show f003
100%
Passed
Tokens
689
Input
137
Total output
109
Reasoning within output
git_show f004
100%
Passed
Tokens
482
Input
75
Total output
61
Reasoning within output
git_show f005
100%
Passed
Tokens
208
Input
230
Total output
256
Reasoning within output
git_show f006
100%
Passed
Tokens
219
Input
103
Total output
99
Reasoning within output
git_show f007
100%
Passed
Tokens
205
Input
66
Total output
59
Reasoning within output
git_show f008
100%
Passed
Tokens
219
Input
398
Total output
314
Reasoning within output
git_show f009
100%
Passed
Tokens
324
Input
233
Total output
191
Reasoning within output
git_show f010
100%
Passed
Tokens
206
Input
199
Total output
201
Reasoning within output
git_show f011
100%
Passed
Tokens
234
Input
73
Total output
74
Reasoning within output
git_show f012
100%
Passed
Tokens
347
Input
81
Total output
83
Reasoning within output
merge_conflicts f001
0%
Failed
Tokens
98
Input
2,074
Total output
2,356
Reasoning within output
merge_conflicts f002
100%
Passed
Tokens
122
Input
543
Total output
517
Reasoning within output
merge_conflicts f003
100%
Passed
Tokens
92
Input
701
Total output
785
Reasoning within output
merge_conflicts f004
0%
Failed
Tokens
163
Input
820
Total output
697
Reasoning within output
merge_conflicts f005
100%
Passed
Tokens
130
Input
781
Total output
881
Reasoning within output
merge_conflicts f006
0%
Failed
Tokens
143
Input
32,768
Total output
26
Reasoning within output
merge_conflicts f007
100%
Passed
Tokens
153
Input
190
Total output
180
Reasoning within output
merge_conflicts f008
100%
Passed
Tokens
119
Input
213
Total output
215
Reasoning within output
merge_conflicts f009
100%
Passed
Tokens
119
Input
414
Total output
429
Reasoning within output
merge_conflicts f010
0%
Failed
Tokens
202
Input
282
Total output
304
Reasoning within output
merge_conflicts f011
100%
Passed
Tokens
156
Input
573
Total output
529
Reasoning within output
merge_conflicts f012
0%
Failed
Tokens
251
Input
32,768
Total output
21
Reasoning within output
rebase f001
0%
Failed
Tokens
110
Input
1,044
Total output
1,219
Reasoning within output
rebase f002
100%
Passed
Tokens
144
Input
171
Total output
151
Reasoning within output
rebase f003
100%
Passed
Tokens
117
Input
76
Total output
84
Reasoning within output
rebase f004
100%
Passed
Tokens
177
Input
261
Total output
211
Reasoning within output
rebase f005
100%
Passed
Tokens
141
Input
862
Total output
968
Reasoning within output
rebase f006
100%
Passed
Tokens
157
Input
996
Total output
918
Reasoning within output
rebase f007
100%
Passed
Tokens
155
Input
172
Total output
176
Reasoning within output
rebase f008
100%
Passed
Tokens
132
Input
194
Total output
206
Reasoning within output
rebase f009
100%
Passed
Tokens
116
Input
101
Total output
118
Reasoning within output
rebase f010
0%
Failed
Tokens
227
Input
532
Total output
561
Reasoning within output
rebase f011
100%
Passed
Tokens
169
Input
889
Total output
798
Reasoning within output
rebase f012
0%
Failed
Tokens
282
Input
511
Total output
547
Reasoning within output
reflog f001
0%
Failed
Tokens
264
Input
287
Total output
211
Reasoning within output
reflog f002
100%
Passed
Tokens
254
Input
258
Total output
135
Reasoning within output
reflog f003
100%
Passed
Tokens
218
Input
280
Total output
178
Reasoning within output
reflog f004
100%
Passed
Tokens
363
Input
1,648
Total output
617
Reasoning within output
reflog f005
100%
Passed
Tokens
427
Input
859
Total output
122
Reasoning within output
reflog f006
100%
Passed
Tokens
275
Input
362
Total output
127
Reasoning within output
reflog f007
100%
Passed
Tokens
318
Input
416
Total output
154
Reasoning within output
reflog f008
100%
Passed
Tokens
401
Input
280
Total output
57
Reasoning within output
reflog f009
100%
Passed
Tokens
363
Input
1,198
Total output
232
Reasoning within output
reflog f010
100%
Passed
Tokens
304
Input
1,191
Total output
132
Reasoning within output
reflog f011
100%
Passed
Tokens
349
Input
555
Total output
341
Reasoning within output
reflog f012
100%
Passed
Tokens
356
Input
1,255
Total output
304
Reasoning within output
stash_recovery f001
100%
Passed
Tokens
151
Input
124
Total output
90
Reasoning within output
stash_recovery f002
100%
Passed
Tokens
319
Input
174
Total output
110
Reasoning within output
stash_recovery f003
100%
Passed
Tokens
136
Input
160
Total output
111
Reasoning within output
stash_recovery f004
100%
Passed
Tokens
132
Input
94
Total output
68
Reasoning within output
stash_recovery f005
100%
Passed
Tokens
145
Input
82
Total output
17
Reasoning within output
stash_recovery f006
0%
Failed
Tokens
195
Input
36
Total output
45
Reasoning within output
stash_recovery f007
100%
Passed
Tokens
225
Input
84
Total output
52
Reasoning within output
stash_recovery f008
100%
Passed
Tokens
194
Input
437
Total output
316
Reasoning within output
stash_recovery f009
100%
Passed
Tokens
229
Input
584
Total output
459
Reasoning within output
stash_recovery f010
100%
Passed
Tokens
141
Input
140
Total output
111
Reasoning within output
stash_recovery f011
100%
Passed
Tokens
135
Input
330
Total output
235
Reasoning within output
stash_recovery f012
100%
Passed
Tokens
150
Input
169
Total output
113
Reasoning within output
submodule_usage f001
100%
Passed
Tokens
52
Input
86
Total output
80
Reasoning within output
submodule_usage f002
100%
Passed
Tokens
109
Input
471
Total output
525
Reasoning within output
submodule_usage f003
75%
Failed
Tokens
119
Input
804
Total output
771
Reasoning within output
submodule_usage f004
100%
Passed
Tokens
102
Input
206
Total output
226
Reasoning within output
submodule_usage f005
66.7%
Failed
Tokens
59
Input
2,567
Total output
2,888
Reasoning within output
submodule_usage f006
0%
Failed
Tokens
100
Input
488
Total output
538
Reasoning within output
submodule_usage f007
100%
Passed
Tokens
60
Input
159
Total output
144
Reasoning within output
submodule_usage f008
100%
Passed
Tokens
111
Input
1,160
Total output
1,251
Reasoning within output
submodule_usage f009
100%
Passed
Tokens
107
Input
193
Total output
215
Reasoning within output
submodule_usage f010
100%
Passed
Tokens
108
Input
350
Total output
411
Reasoning within output
submodule_usage f011
100%
Passed
Tokens
109
Input
359
Total output
399
Reasoning within output
submodule_usage f012
100%
Passed
Tokens
59
Input
154
Total output
154
Reasoning within output
tag_management f001
100%
Passed
Tokens
70
Input
177
Total output
157
Reasoning within output
tag_management f002
100%
Passed
Tokens
66
Input
2,377
Total output
2,671
Reasoning within output
tag_management f003
100%
Passed
Tokens
74
Input
186
Total output
168
Reasoning within output
tag_management f004
100%
Passed
Tokens
82
Input
206
Total output
196
Reasoning within output
tag_management f005
100%
Passed
Tokens
93
Input
247
Total output
228
Reasoning within output
tag_management f006
100%
Passed
Tokens
76
Input
105
Total output
87
Reasoning within output
tag_management f007
100%
Passed
Tokens
74
Input
1,367
Total output
1,171
Reasoning within output
tag_management f008
33.3%
Failed
Tokens
79
Input
1,287
Total output
1,049
Reasoning within output
tag_management f009
100%
Passed
Tokens
81
Input
400
Total output
403
Reasoning within output
tag_management f010
100%
Passed
Tokens
58
Input
60
Total output
51
Reasoning within output
tag_management f011
100%
Passed
Tokens
60
Input
93
Total output
105
Reasoning within output
tag_management f012
100%
Passed
Tokens
99
Input
729
Total output
670
Reasoning within output
worktree_usage f001
100%
Passed
Tokens
158
Input
228
Total output
245
Reasoning within output
worktree_usage f002
100%
Passed
Tokens
167
Input
590
Total output
560
Reasoning within output
worktree_usage f003
100%
Passed
Tokens
241
Input
154
Total output
137
Reasoning within output
worktree_usage f004
100%
Passed
Tokens
243
Input
125
Total output
127
Reasoning within output
worktree_usage f005
33.3%
Failed
Tokens
153
Input
610
Total output
623
Reasoning within output
worktree_usage f006
100%
Passed
Tokens
158
Input
357
Total output
328
Reasoning within output
worktree_usage f007
100%
Passed
Tokens
155
Input
291
Total output
303
Reasoning within output
worktree_usage f008
50%
Failed
Tokens
179
Input
622
Total output
596
Reasoning within output
worktree_usage f009
100%
Passed
Tokens
269
Input
342
Total output
366
Reasoning within output
worktree_usage f010
100%
Passed
Tokens
250
Input
313
Total output
326
Reasoning within output
worktree_usage f011
100%
Passed
Tokens
239
Input
346
Total output
354
Reasoning within output
worktree_usage f012
100%
Passed
Tokens
154
Input
157
Total output
132
Reasoning within output