Research scaffold

A model that thinks in latent space.

NOESIS is a research scaffold for continuous-thought reasoning: the model thinks in the residual stream instead of the token stream, allocates its own thinking budget per problem, and verifies its latent thoughts before emitting an answer.

87 Tests passing
0 Fabricated results
4 / 4 Components built
CPU Runs on

What is this?

Reasoning LLMs (o1, R1, Claude Extended-Thinking) have shown that spending more tokens on "thinking" before answering reliably improves accuracy. The current production form — chain-of-thought in token space — has a fundamental inefficiency: most thinking tokens are linguistic glue (transitions, hedges, restatements) rather than load-bearing reasoning steps.

NOESIS thinks in continuous latent space instead. After emitting <bot>, the last hidden state is projected through a learned matrix into the next input embedding — no token is decoded, no commitment is made. The model can keep multiple alternatives alive in superposition. After K such steps, it emits <eot> and resumes language-mode generation.

The four ideas

NOESIS combines four mechanisms that make latent reasoning practical:

1. Continuous-thought backbone

Reasoning happens in the residual stream. The last hidden state is fed back as the next input embedding via a learned bridge.

2. Adaptive think-budget

A small policy decides how many latent steps to spend on a given problem. Trained with reward = accuracy − λ·steps.

3. Latent verifier

A separate head reads the trajectory and emits a confidence score. Low confidence triggers retry. Also serves as the RL baseline.

4. Continuous-thought RL

Gaussian perturbation on latent thoughts gives a well-defined trajectory log-probability, unlocking REINFORCE through continuous reasoning.

Current state

Every architectural piece of NOESIS exists in code and is tested. 87 / 87 tests passing

The two gating tests from spec §4 Phase 1 both pass:

What is intentionally not here

No fabricated benchmark numbers, no lit notes written from memory, no training runs without compute behind them. The repo's "no fabricated results" rule applies to documentation too.

Read on