: GitBench

Three levels: ProviderBase ModelReasoning Level. Each base model can have multiple reasoning levels. Higher levels are usually more accurate but slower and costlier. Click any card to drill into fixture-by-fixture results.

Deepseek

deepseek-v4-flash

Text Output
JSON Schema
M

Mistralai

devstral-2512

Text Output
JSON Schema
N

Nvidia

nemotron-3-nano-30b-a3b

Text Output
JSON Schema
P

Poolside

laguna-xs.2

Text Output
JSON Schema