Why Polynomial Hopfield

The entire Energy Language approach depends on one architectural decision: using polynomial Hopfield networks instead of transformers.

Transformer vs HLM

The Core Difference

Transformers use softmax attention, which is equivalent to a polynomial interaction with degree d → ∞. As d increases, the energy landscape smooths out until there is one single attractor — a global minimum that everything collapses into. There's nothing to surgically edit.

Polynomial Hopfield networks (d=3) maintain a rich energy landscape with 200+ discrete attractor basins per layer. Each basin is a stable memory pattern — a local minimum in the energy function. These basins can be individually targeted for surgery.

The Math

The energy function for a polynomial Hopfield layer:

E(x) = -1/d · |x|^d + 0.5 · x^T · W · f(x)

Where f(x) = sign(x) · |x|^(d-1) is the polynomial interaction function.

When d → ∞ (softmax): one basin, smooth landscape
When d = 3 (HLM): many basins, rich landscape, each surgically accessible

Why d=3 Specifically?

d=2 (classical Hopfield): basins exist but capacity is limited (~0.14N patterns for N neurons)
d=3 (HLM): exponentially more basins, sharper separation, better surgical precision
d→∞ (transformer): one basin, no surgery possible

The sweet spot is finite d large enough for capacity but small enough to maintain discrete, separable basins.

Implications

This isn't a limitation of transformers that can be patched — it's a fundamental property of the softmax function. To make neural networks surgically programmable, the architecture must support discrete attractor basins. That's what polynomial Hopfield layers provide.

Why Polynomial Hopfield ​

The Core Difference ​

The Math ​

Why d=3 Specifically? ​

Implications ​

Why Polynomial Hopfield

The Core Difference

The Math

Why d=3 Specifically?

Implications