Skip to content

Why HLM

One Core. Matched Frontends.

Transformers need a different architecture for every problem. Vision transformers, audio transformers, multimodal transformers — each one bolts on modality-specific encoders, tokenizers, and projection layers. The result is an engineering patchwork where text, images, and audio rarely expose the same editable substrate.

HLM uses a shared core for everything: polynomial Hopfield layers. The modality entry point is allowed to match the data geometry — causal token mixing for language, 2D spatial mixing for images, temporal mixing for audio, and spatial frontends for 3D data — while the Hopfield core keeps the same energy-landscape semantics.

This isn't a convenience — it's a fundamental difference in how the model builds representations.

Transformer vs HLM

How Transformers Understand

A transformer processes tokens through attention layers that compute weighted averages over the input sequence. Understanding emerges from statistical correlation — which tokens co-occur, which patterns predict the next token.

This works remarkably well for language. But it has structural limitations:

  • No discrete memory. Knowledge is distributed across billions of parameters. You cannot point to where a specific fact is stored.
  • No compositional structure. Attention is a flat operation. Hierarchical reasoning requires stacking many layers to approximate what should be a structural property.
  • No shared editable substrate. Text, images, and audio usually need separate towers and alignment objectives before they can communicate.
  • Fragile under editing. Change one parameter and you break everything. There is no surgical target.

How HLM Understands

An HLM layer stores knowledge as attractor basins in an energy landscape. Each basin is a stable memory pattern — a local minimum that the network converges to when it encounters similar input.

This changes everything about how the model relates to what it knows:

  • Discrete, addressable memory. Each concept occupies a specific basin. You can find it, measure it, and modify it.
  • Compositional by construction. Basins combine through energy superposition. A "polite + technical" concept is the natural blend of two energy minima — not a learned trick.
  • Shared Hopfield substrate. Text, images, spatial data, and audio can use modality-matched frontends while exposing the same kind of basin object to Energy Language operations.
  • Surgically editable. Because knowledge has a location, you can operate on it. Inject, remove, move, blend — without touching anything else.

Building World Models

The difference matters most when you ask: what does the model actually know?

A transformer knows correlations. It knows that "the cat sat on the ___" is likely followed by "mat" because that pattern appears in the training data. It does not have a representation of cats, mats, or sitting that exists independent of the token sequence.

An HLM stores attractors. A basin is a stable state that the network converges to from related inputs. The basin is the concept, and it exists in the energy landscape whether or not a specific input is present.

This is closer to how world models should work:

TransformerHLM
Knowledge formatDistributed weightsDiscrete attractor basins
Concept representationImplicit (statistical)Explicit (energy minima)
Multimodal groundingLearned alignment between towersMatched frontends into a shared Hopfield substrate
CompositionalityApproximated through depthNatural through energy superposition
EditabilityNone (retrain)Surgical (basin operations)
InterpretabilityOpaqueBasins are measurable states

The Convergence Principle

The validated principle is stricter than "frontend alone." Inputs with real structure need a matched stack before they reach the shared Hopfield core:

text
modality frontend + mixer + readout + Hopfield core

Language, images, audio, spatial data, and sensor streams each keep the frontend and mixing pattern their geometry requires. The common layer is the energy-landscape representation exposed after that bridge.

The result is a model family that can produce predictions while still exposing a stable landscape of concepts that can be surveyed, constrained, audited, and composed across deployment tiers.

Practical Implications

For deployment: One Hopfield core pattern handles text, vision, spatial, and audio, with frontends that match the data shape.

For customization: Edit behavior in milliseconds. Survey, capture, inject, move, and audit basin-level behavior. No retraining for local surgery.

For understanding: Survey what the model knows. Measure it. Verify it. The energy landscape is not a black box — it's a programmable surface.

For safety: Remove specific knowledge surgically. Guard against over-modification. Audit the operation log. Every change is traceable because every concept has a location.

Early Access

Validated demo checkpoints are listed on the current status page. Join the waitlist for early access, or contact us for commercial and pilot projects.