NOESIS
A language model that thinks in continuous latent space, allocates its own thinking budget per problem, and verifies its latent thoughts before emitting an answer.
NOESIS from νόησις — pure thought; intellect's direct apprehension
Implementation status
From-scratch reimplementation of the primitives — tested on CPU (2026-05)
A from-scratch reference implementation lives in architectures/05-noesis/noesis-lm and passes 87 tests on CPU. The stochastic latent loop's REINFORCE score-function gradient is verified against the analytical form and a finite-difference check. This is a small-scale reimplementation of published ideas (Coconut / latent reasoning), not an original architecture and not a trained model: the reasoning, budget, and verifier modules exist and are unit-tested, but there is no end-to-end training loop, no task, and no comparative result yet. The "target marquee result" below is a goal from the spec, not a measured outcome.
The thesis in one paragraph
Reasoning LLMs (o1, R1, the Claude Extended-Thinking family) have shown that test-time compute scaling works: spending more tokens “thinking” reliably improves accuracy on hard problems. The current production form, CoT in token space, has a fundamental inefficiency: most thinking tokens are linguistic glue (transitions, hedges, restatements) rather than load-bearing reasoning steps. Coconut (Hao et al., NeurIPS 2024) attacks this directly: instead of decoding each reasoning step to a word token, feed the last hidden state back as the next input embedding — stay in latent space. NOESIS makes latent reasoning practical by combining four ideas: continuous-thought backbone, learned adaptive think-budget, latent verifier, and a clean continuous-thought RL formulation. The result is a reasoning model that thinks in latent space, knows when to stop thinking, verifies its conclusions, and can be RL-trained end-to-end.
Architecture in one figure
prompt embeddings
│
▼
┌─────────────────────┐
│ Transformer encoder │
└─────────┬───────────┘
│
▼
┌──────────────────────┐
│ Adaptive K-thoughts │ policy π_K: how many latent
│ policy π_K(K | h_0) │ steps to spend? K ∈ {1,2,4,…}
└─────────┬────────────┘
│ K
▼
┌──────────────────────┐
│ Latent thought loop │ for k=1..K:
│ (Coconut + noise) │ h_k = LM(h_{k-1}) + σ ε
│ │ (noise ε enables clean RL)
└─────────┬────────────┘
│
▼
┌──────────────────────┐
│ Latent verifier │ confidence(h_1,…,h_K) → c
│ (small MLP) │ if c < τ: extra thinking round
└─────────┬────────────┘
│
▼
answer head
│
▼
y_1, y_2, …
The reference implementation in Appendix B of the master prompt is the stochastic latent loop with REINFORCE trajectory_log_prob. The score-function gradient d/dμ log p = ε / σ matches both the analytical form and a properly-set-up finite-difference numerical gradient. The σ-gradient correctly comes only from the log-normalization term — a subtle bug caught and fixed in v1 of the reference during development.
Key contributions
- Continuous-thought backbone (Coconut-style) — reasoning happens in the residual stream, not the token stream.
- Learned adaptive think-budget — a small policy decides how many latent steps to spend, trained with RL where the reward is accuracy − λ · steps.
- Latent verifier — a separate small network reads the continuous-thought trajectory and emits a confidence score; low confidence triggers an extra round of thinking (the latent-space analog of self-consistency).
- Continuous-thought RL — a clean policy-gradient formulation through continuous thoughts. Treats the latent trajectory as a stochastic process whose log-probability is derived from injected perturbation noise. Unlocks R1-style post-training without converting thoughts to tokens.
Phased plan
| Phase | Deliverable | Status |
|---|---|---|
| 0 — Bootstrap | Repo scaffold; reference stochastic latent loop + REINFORCE log-prob | done |
| 1 — Coconut reproduction | Vanilla continuous-thought decoding at fixed K | done |
| 2 — Stochastic latent + REINFORCE | Noise-injected latent loop; trajectory log-prob verified vs. finite-diff | done |
| 3 — Adaptive K policy | Think-budget policy module (built + tested); not yet trained on a difficulty signal | partial (module only) |
| 4 — Latent verifier | Confidence head + calibration metrics; mode controller triggers extra thinking | done (module) |
| 5 — RL on math/code | GRPO-style RL through latent thoughts; MATH / HumanEval lift | not started |
| 6 — Paper | Marquee plot: ≥ 3× fewer thinking tokens at matched accuracy | not started |
Required reading
- Hao et al. 2024/2025 — Coconut (arxiv 2412.06769 v3) — the anchor
- Pfau et al. 2024 — “Let's Think Dot by Dot” (latent tokens as null padding boost reasoning)
- Goyal et al. 2024 — Pause tokens / Filler tokens
- Hu et al. 2024 — Quiet-STaR (self-taught reasoning with rationales)
- Akyürek et al. 2024 — TTT for few-shot reasoning
- OpenAI 2024 — o1 system card; DeepSeek 2025 — R1 (discrete-CoT-with-RL baseline)
Target marquee result goal — not yet measured
A goal from the spec — not a measured result
≥ 3× fewer total inference tokens than a discrete CoT-RL baseline at matched MATH accuracy. This requires a training loop and task harness that have not been built.