A side-by-side view of the major closed and open reasoning models. Architecture, training recipe, verifier kind, and the headline benchmark numbers as published — with the closed-model entries clearly flagged as vendor-reported.
Loading…
| Model | Vendor | Released | License | Params (B) | Training | AIME-24 p@1 | MATH-500 | GPQA-D | SWE-V |
|---|
All scores are as published. Where benchmarks have multiple slices (LiveCodeBench date cutoffs, SWE-bench harness variants), the cell uses the slice the vendor or paper headline-reported.
Closed-model numbers are vendor-reported. They have not been independently reproduced by this list. Treat them as primary-source claims, not measurements.
Missing a model? Open an issue · View raw models.json