Reasoning model families (catalog)

A short reference catalog of the major reasoning-model families circa 2025–2026. Not exhaustive — the goal is to disambiguate names that appear across chapters and essays.

For each family: release date(s), open-vs-closed status, key public claims, and where in this list it’s discussed.

Closed-weights families

OpenAI o-series

o1 (2024-09): the announcement that triggered the field’s pivot. RL-trained reasoning over long internal CoTs.
o1-mini (2024-09): smaller, faster, math-focused.
o1-pro (2024-Q4): inference-time-scaled o1.
o3 family (2024-12 announcement, 2025 rollout): successor with reportedly stronger Codeforces/Frontier benchmarks. Vendor-reported gold-class numbers; no public weights or test infrastructure.
o4 family (2025-Q3+): the rolling successor line.

Status: 🔴 closed weights, vendor-reported numbers. Cited carefully throughout the list; never as “the canonical scaling curve.”

Anthropic Claude with extended thinking

Claude 3.5 Sonnet introduced visible thinking traces in select interfaces.
Claude 3.7 Sonnet (2025-02) — first Anthropic model with a publicly-documented “extended thinking” mode and explicit reasoning-budget control.
Claude Opus 4.x with thinking — the line discussed throughout this list when “Claude with thinking” appears.

Status: 🟡 partial — methods writeups are substantive, weights closed. Faithfulness work (Anthropic 2025) gives some research-grade access.

Google Gemini reasoning variants

Gemini 2.5 Flash / Pro with thinking (2025-Q1+).
Gemini Deep Think (2025-Q2-Q3) — reportedly achieved a gold-medal-equivalent score at IMO 2025 (vendor-reported).

Status: 🔴 closed. Connected to the DeepMind AlphaProof/AlphaGeometry line (Chapter 4) on the formal-proof side.

Open-weights families

DeepSeek

DeepSeek-R1 / R1-Zero (2025-01): the open-recipe paradigm shift. R1-Zero is the no-SFT pure-RL variant; R1 is R1-Zero plus cold-start SFT.
R1-Distill- N — distilled variants on Qwen2.5 (1.5B, 7B, 14B, 32B) and Llama-3 (8B, 70B) bases.
DeepSeekMath (2024) — predecessor; introduces GRPO.
DeepSeek-V3 (2024-12) — the base that R1 builds on.

Status: 🟢 open weights, open recipe. The empirical anchor for most of this list.

Qwen reasoning variants

Qwen2.5-Math family — math-tuned base for many open reproductions; not itself a reasoning model.
QwQ-32B-Preview (2024-11): early open reasoner predating R1.
Qwen3 (2025+) reasoning variants.

Status: 🟢 open. Frequently used as a base in open RLVR work; appears in notebooks 01–04.

Alibaba / Tongyi reasoning models

Several reasoning-mode variants of the Tongyi family in 2025–2026; treated as the Qwen sibling here.

Open R1 reproductions

HuggingFace Open-R1 project (2025): open-source R1 recipe reproduction.
SimpleRL (2025): minimal open RLVR.
Open-Reasoner-Zero (2025): another open R1-Zero-style reproduction.
verl (Volcano Engine RL library, 2024–2025): RL framework heavily used for R1 reproductions.

Status: 🟢 open. Listed in Chapter 5.

Tülu / AI2

Tülu 3 (2024-11): open post-training including RLVR. The earliest named RLVR recipe.
Tülu 3.5 and successors.

Status: 🟢 open.

Mistral reasoning variants

Mistral has released several reasoning-tuned models in 2025; the lineage is less consolidated than DeepSeek/Qwen and is treated paper-by-paper.

Formal-proof systems (a separate clade)

DeepMind AlphaProof + AlphaGeometry 2

AlphaProof: pretrained LM + AlphaZero-style RL over Lean 4 proofs.
AlphaGeometry 2: geometry-specialized formal prover.
IMO 2024: 4/6 problems, silver-medal threshold.
Nature methodology paper, Nov 2025.

Cited in Chapter 4 as the regime where explicit search and RL are complementary, not competing.

Other formal-proof RL systems

Several 2025 follow-ups on Lean / Coq with RL and tree search; tracked under WANTED.md.

How to use this catalog

A paper citing “R1” without further qualification usually means the full SFT+RL pipeline R1, not R1-Zero. When R1-Zero is meant, the paper says so.
“o1” without a year typically refers to the initial 2024-09 model; “o3” to the 2024-12-announcement / 2025-rollout successor.
“Claude with thinking” is post-3.5; before that, Claude’s CoT was implicit / chat-only.
“Reasoning model” in this list’s prose means an RL-trained, long-CoT-emitting model in the family above. We do not call a base instruct model a “reasoning model” even if it does CoT prompting.

What this catalog deliberately omits

Performance numbers (those live in tracker/benchmarks.md).
Architectural details when not load-bearing for the methodology story.
Pre-2024 chain-of-thought-prompted models (those are not reasoning models in the post-o1 sense).
Models without published methodology and without a public release (we have nothing to say about them).

Filed 2026-05-14. Update as the field’s families consolidate or split. PR-friendly.