Use Case: Research Notebook for Interpretability

Use HLM3's multi-basin structure + Energy Language operations as a research substrate for interpretability work. Each basin is addressable, observable, and manipulable — which makes HLM3 a uniquely clean testbed for questions that are hard to pose on transformers.

Who this is for

Interpretability researchers looking for a cleaner substrate than post-hoc transformer analysis
PhD students picking a research direction where "what is the model actually doing" can be investigated with discrete, addressable operations
Academic groups working on energy-based models, associative memory, or mechanistic interpretability who want weights + a full operation language to probe with

The problem you're solving

Interpretability research on transformers is heavy on reconstruction: probe for concepts, cluster attention patterns, train sparse autoencoders on activations, run SAE-feature-attribution experiments. Valuable work, but every result is "how the model seems to work from the outside." The primitives you're probing (attention distributions, MLP activations) are continuous and entangled.

HLM3's multi-basin structure gives you something different — discrete, addressable memory locations that you can measure, perturb, or cut directly:

47-200 basins per layer (seed-and-dimension dependent; research paper for the basin-counting study)
Every basin has coordinates in the hidden-state energy landscape
Every basin can be individually surveyed, injected, removed, moved
Every forward pass records which basins it converged to (no post-hoc reconstruction needed)

This is a fundamentally different research surface from transformers. It's not better or worse — it's cleaner for a different class of question.

What you'll build

A Jupyter notebook-based research workflow:

Load an HLM3 checkpoint
Survey the full basin structure per layer
Capture a concept ("polite tone" / "mathematics" / "refusal") from examples
Watch the captured concept appear as a new basin in the landscape
Probe how other basins respond — do adjacent basins shift? Which basins does the new one trade activation with?
Run causal scans: if I intervene on basin 15 (set activation to 0), what breaks?
Verify effects by generating text before and after each operation

Every step produces a reproducible artefact — the basin structure at each stage is saved, addressable, and versionable.

The stack

Piece	Source
HLM3 checkpoint	Waitlist — commercial / research access
Jupyter integration	`qriton-hlm` has native Jupyter support — docs
Energy Language operations	36 operations reference
Your research scaffold	Any — PyTorch, NumPy, matplotlib, whatever you already use

Walkthrough

Step 1 — Load in Jupyter

python

# notebook cell 1
from qriton_hlm import BasinSurgeon
from qriton_hlm.jupyter import visualise_landscape, visualise_trajectory

surgeon = BasinSurgeon.from_checkpoint('hlm3-large-ffn.pt')
surgeon.num_layers()   # → 8

Step 2 — Survey baseline basin structure

python

# notebook cell 2
for layer in range(surgeon.num_layers()):
    s = surgeon.survey(layer=layer)
    print(f'L{layer}: {s["num_basins"]} basins, '
          f'entropy={s["usage_entropy_nats"]:.2f} nats, '
          f'β={s["beta"]:.2f}')

# Expected output for HLM3-Large+FFN (~200-basin dim=512):
# L0: 200 basins, entropy=4.23 nats, β=1.03
# L1: 198 basins, entropy=4.21 nats, β=2.14
# ...
# L7: 199 basins, entropy=3.87 nats, β=6.93  ← self-learned beta-depth hierarchy

The monotonic β increase with depth is a real structural finding — shallow layers learn smooth mixing, deep layers learn sharp categorical attractors. Like early visual cortex vs deep categorical layers in the brain.

Step 3 — Visualise the landscape

python

# notebook cell 3
visualise_landscape(surgeon, layer=5, dim_reduction='umap', n_probes=500)
# → interactive 2D projection of basin centroids + probe trajectories

You can now see the basin structure of L5 and watch where text inputs settle. Concrete visualisation of the "cat" vs "mat" vs "code" basins that transformers don't give you.

Step 4 — Capture a concept and watch it land

python

# notebook cell 4
# Baseline
before_survey = surgeon.survey(layer=5)

# Capture "polite" from 3 examples
surgeon.capture(layer=5, concept='polite', text='Thank you for reaching out')
surgeon.capture(layer=5, concept='polite', text='I appreciate your question')
surgeon.capture(layer=5, concept='polite', text='It would be my pleasure to help')

# Inject at 0.1 strength
surgeon.inject_concept(layer=5, concept='polite', strength=0.1)

# After
after_survey = surgeon.survey(layer=5)
print(f'Before: {before_survey["num_basins"]} basins')
print(f'After:  {after_survey["num_basins"]} basins (+{after_survey["num_basins"]-before_survey["num_basins"]})')

# Where in the landscape did the new basin land?
visualise_landscape(surgeon, layer=5, highlight_concept='polite')

You literally see the new basin appear on the UMAP projection. You can measure its distance to neighbouring basins. You can check which existing basins' radii shrank to make room.

Step 5 — Probe interactions between basins

python

# notebook cell 5
# Which existing basins overlap with 'polite'?
neighbours = surgeon.find_basin_neighbours(layer=5, concept='polite', top_k=5)
for b in neighbours:
    print(f'Basin {b["id"]} (cos {b["cosine_similarity"]:.3f}): probed with → "{b["example_probe"]}"')

# Expected output:
# Basin 142 (cos 0.61): probed with → 'greeting'
# Basin 89  (cos 0.54): probed with → 'acknowledgment'
# Basin 203 (cos 0.48): probed with → 'helpful'
# ...

This tells you what 'polite' is near in the learned representation space — a direct measurement, not a SAE hypothesis. Compare across layers: is polite near formal in L7 but near greeting in L2?

Step 6 — Causal intervention

python

# notebook cell 6
# What happens if we zero out basin 142 (the "greeting" neighbour)?
trajectory_before = surgeon.trace_generation('Hi, can you help me?', layer=5)
surgeon.intervene(layer=5, basin_id=142, intervention='zero')
trajectory_after = surgeon.trace_generation('Hi, can you help me?', layer=5)

# Visualise: same input, different trajectories through the landscape
visualise_trajectory([trajectory_before, trajectory_after], layer=5)

This is Pearl-style do(X) causal intervention on discrete model-internal variables. The operation is reversible (surgeon.revert()), so you can explore a tree of counterfactual model states.

Step 7 — Export experiment artefacts

Every state of the model during your investigation is addressable:

python

# notebook cell 7
experiment_record = {
    'baseline_survey': before_survey,
    'post_injection_survey': after_survey,
    'injected_concept': surgeon.export_concept('polite'),
    'neighbour_analysis': neighbours,
    'causal_intervention': {
        'zeroed_basin': 142,
        'trajectory_before': trajectory_before.to_dict(),
        'trajectory_after': trajectory_after.to_dict(),
    },
    'seed': 42,
    'checkpoint_hash': surgeon.model_weights_hash(),
}

with open('experiment-2026-04-24.json', 'w') as f:
    json.dump(experiment_record, f, indent=2)

The experiment is now a fully reproducible artefact — anyone with the checkpoint hash can replay every operation deterministically.

Research directions this enables

Real questions that are hard on transformers and natural on HLM3:

Concept composition: can we blend two orthogonal concepts (e.g. formal + mathematical) and produce a basin that lives at the expected intermediate point?
Concept locality: is polite stored as a single basin or distributed across 2-3? Does it matter for downstream behaviour?
Cross-model portability: if I export polite from HLM3-large and import into HLM3-medium, does the injected basin end up in a "similar" location? How do you measure "similar" across different hidden-dim spaces?
Basin dynamics under training: track how individual basins move during fine-tuning. Do they drift smoothly or jump discretely? (Analogous to the "feature emergence" literature on transformers, but directly observable.)
Causal graph discovery: run systematic do() interventions to build a causal graph over basins. Is the resulting graph stable across seeds / architectures?

Each is a publishable research direction. The Energy Language operations reference has the primitives you'd use for each.

Caveats & what to read next

Research access is via waitlist: request access with your institutional affiliation.
Published basin counts are seed- and dimension-dependent. Don't take a single number as the architecture's fingerprint — always cite N-seeds mean ± std.
Interpretability research is sensitive to checkpoint identity. Different HLM3 checkpoints have different basin structures; an experiment on hlm3-large-ffn-v1.4.2 won't replicate on v1.5.0 without retraining.
Surgery on research checkpoints can degrade capability. Always verify after each operation if your downstream measurement depends on capability being preserved.

Use Case: Research Notebook for Interpretability ​

Who this is for ​

The problem you're solving ​

What you'll build ​

The stack ​

Walkthrough ​

Step 1 — Load in Jupyter ​

Step 2 — Survey baseline basin structure ​

Step 3 — Visualise the landscape ​

Step 4 — Capture a concept and watch it land ​

Step 5 — Probe interactions between basins ​

Step 6 — Causal intervention ​

Step 7 — Export experiment artefacts ​

Research directions this enables ​

Caveats & what to read next ​

Related ​

Use Case: Research Notebook for Interpretability

Who this is for

The problem you're solving

What you'll build

The stack

Walkthrough

Step 1 — Load in Jupyter

Step 2 — Survey baseline basin structure

Step 3 — Visualise the landscape

Step 4 — Capture a concept and watch it land

Step 5 — Probe interactions between basins

Step 6 — Causal intervention

Step 7 — Export experiment artefacts

Research directions this enables

Caveats & what to read next

Related