NOESIS

A language model that thinks in continuous latent space, allocates its own thinking budget per problem, and verifies its latent thoughts before emitting an answer.

NOESIS from νόησις — pure thought; intellect's direct apprehension

Implementation status

From-scratch reimplementation of the primitives — tested on CPU (2026-05)

A from-scratch reference implementation lives in architectures/05-noesis/noesis-lm and passes 87 tests on CPU. The stochastic latent loop's REINFORCE score-function gradient is verified against the analytical form and a finite-difference check. This is a small-scale reimplementation of published ideas (Coconut / latent reasoning), not an original architecture and not a trained model: the reasoning, budget, and verifier modules exist and are unit-tested, but there is no end-to-end training loop, no task, and no comparative result yet. The "target marquee result" below is a goal from the spec, not a measured outcome.

The thesis in one paragraph

Reasoning LLMs (o1, R1, the Claude Extended-Thinking family) have shown that test-time compute scaling works: spending more tokens “thinking” reliably improves accuracy on hard problems. The current production form, CoT in token space, has a fundamental inefficiency: most thinking tokens are linguistic glue (transitions, hedges, restatements) rather than load-bearing reasoning steps. Coconut (Hao et al., NeurIPS 2024) attacks this directly: instead of decoding each reasoning step to a word token, feed the last hidden state back as the next input embedding — stay in latent space. NOESIS makes latent reasoning practical by combining four ideas: continuous-thought backbone, learned adaptive think-budget, latent verifier, and a clean continuous-thought RL formulation. The result is a reasoning model that thinks in latent space, knows when to stop thinking, verifies its conclusions, and can be RL-trained end-to-end.

Architecture in one figure

           prompt embeddings
                  │
                  ▼
        ┌─────────────────────┐
        │ Transformer encoder │
        └─────────┬───────────┘
                  │
                  ▼
        ┌──────────────────────┐
        │ Adaptive K-thoughts  │  policy π_K: how many latent
        │ policy π_K(K | h_0)  │  steps to spend? K ∈ {1,2,4,…}
        └─────────┬────────────┘
                  │  K
                  ▼
        ┌──────────────────────┐
        │ Latent thought loop  │  for k=1..K:
        │ (Coconut + noise)    │    h_k = LM(h_{k-1}) + σ ε
        │                      │    (noise ε enables clean RL)
        └─────────┬────────────┘
                  │
                  ▼
        ┌──────────────────────┐
        │ Latent verifier      │  confidence(h_1,…,h_K) → c
        │ (small MLP)          │  if c < τ: extra thinking round
        └─────────┬────────────┘
                  │
                  ▼
              answer head
                  │
                  ▼
              y_1, y_2, …

The reference implementation in Appendix B of the master prompt is the stochastic latent loop with REINFORCE trajectory_log_prob. The score-function gradient d/dμ log p = ε / σ matches both the analytical form and a properly-set-up finite-difference numerical gradient. The σ-gradient correctly comes only from the log-normalization term — a subtle bug caught and fixed in v1 of the reference during development.

Key contributions

Continuous-thought backbone (Coconut-style) — reasoning happens in the residual stream, not the token stream.
Learned adaptive think-budget — a small policy decides how many latent steps to spend, trained with RL where the reward is accuracy − λ · steps.
Latent verifier — a separate small network reads the continuous-thought trajectory and emits a confidence score; low confidence triggers an extra round of thinking (the latent-space analog of self-consistency).
Continuous-thought RL — a clean policy-gradient formulation through continuous thoughts. Treats the latent trajectory as a stochastic process whose log-probability is derived from injected perturbation noise. Unlocks R1-style post-training without converting thoughts to tokens.

Phased plan

NOESIS's phased implementation plan.
Phase	Deliverable	Status
0 — Bootstrap	Repo scaffold; reference stochastic latent loop + REINFORCE log-prob	done
1 — Coconut reproduction	Vanilla continuous-thought decoding at fixed K	done
2 — Stochastic latent + REINFORCE	Noise-injected latent loop; trajectory log-prob verified vs. finite-diff	done
3 — Adaptive K policy	Think-budget policy module (built + tested); not yet trained on a difficulty signal	partial (module only)
4 — Latent verifier	Confidence head + calibration metrics; mode controller triggers extra thinking	done (module)
5 — RL on math/code	GRPO-style RL through latent thoughts; MATH / HumanEval lift	not started
6 — Paper	Marquee plot: ≥ 3× fewer thinking tokens at matched accuracy	not started

Required reading

Hao et al. 2024/2025 — Coconut (arxiv 2412.06769 v3) — the anchor
Pfau et al. 2024 — “Let's Think Dot by Dot” (latent tokens as null padding boost reasoning)
Goyal et al. 2024 — Pause tokens / Filler tokens
Hu et al. 2024 — Quiet-STaR (self-taught reasoning with rationales)
Akyürek et al. 2024 — TTT for few-shot reasoning
OpenAI 2024 — o1 system card; DeepSeek 2025 — R1 (discrete-CoT-with-RL baseline)

Target marquee result goal — not yet measured

A goal from the spec — not a measured result

≥ 3× fewer total inference tokens than a discrete CoT-RL baseline at matched MATH accuracy. This requires a training loop and task harness that have not been built.