Use Case: Research Notebook for Interpretability
Use HLM3's multi-basin structure + Energy Language operations as a research substrate for interpretability work. Each basin is addressable, observable, and manipulable — which makes HLM3 a uniquely clean testbed for questions that are hard to pose on transformers.
Who this is for
- Interpretability researchers looking for a cleaner substrate than post-hoc transformer analysis
- PhD students picking a research direction where "what is the model actually doing" can be investigated with discrete, addressable operations
- Academic groups working on energy-based models, associative memory, or mechanistic interpretability who want weights + a full operation language to probe with
The problem you're solving
Interpretability research on transformers is heavy on reconstruction: probe for concepts, cluster attention patterns, train sparse autoencoders on activations, run SAE-feature-attribution experiments. Valuable work, but every result is "how the model seems to work from the outside." The primitives you're probing (attention distributions, MLP activations) are continuous and entangled.
HLM3's multi-basin structure gives you something different — discrete, addressable memory locations that you can measure, perturb, or cut directly:
- 47-200 basins per layer (seed-and-dimension dependent; research paper for the basin-counting study)
- Every basin has coordinates in the hidden-state energy landscape
- Every basin can be individually surveyed, injected, removed, moved
- Every forward pass records which basins it converged to (no post-hoc reconstruction needed)
This is a fundamentally different research surface from transformers. It's not better or worse — it's cleaner for a different class of question.
What you'll build
A Jupyter notebook-based research workflow:
- Load an HLM3 checkpoint
- Survey the full basin structure per layer
- Capture a concept ("polite tone" / "mathematics" / "refusal") from examples
- Watch the captured concept appear as a new basin in the landscape
- Probe how other basins respond — do adjacent basins shift? Which basins does the new one trade activation with?
- Run causal scans: if I intervene on basin 15 (set activation to 0), what breaks?
- Verify effects by generating text before and after each operation
Every step produces a reproducible artefact — the basin structure at each stage is saved, addressable, and versionable.
The stack
| Piece | Source |
|---|---|
| HLM3 checkpoint | Waitlist — commercial / research access |
| Jupyter integration | qriton-hlm has native Jupyter support — docs |
| Energy Language operations | 36 operations reference |
| Your research scaffold | Any — PyTorch, NumPy, matplotlib, whatever you already use |
Walkthrough
Step 1 — Load in Jupyter
# notebook cell 1
from qriton_hlm import BasinSurgeon
from qriton_hlm.jupyter import visualise_landscape, visualise_trajectory
surgeon = BasinSurgeon.from_checkpoint('hlm3-large-ffn.pt')
surgeon.num_layers() # → 8Step 2 — Survey baseline basin structure
# notebook cell 2
for layer in range(surgeon.num_layers()):
s = surgeon.survey(layer=layer)
print(f'L{layer}: {s["num_basins"]} basins, '
f'entropy={s["usage_entropy_nats"]:.2f} nats, '
f'β={s["beta"]:.2f}')
# Expected output for HLM3-Large+FFN (~200-basin dim=512):
# L0: 200 basins, entropy=4.23 nats, β=1.03
# L1: 198 basins, entropy=4.21 nats, β=2.14
# ...
# L7: 199 basins, entropy=3.87 nats, β=6.93 ← self-learned beta-depth hierarchyThe monotonic β increase with depth is a real structural finding — shallow layers learn smooth mixing, deep layers learn sharp categorical attractors. Like early visual cortex vs deep categorical layers in the brain.
Step 3 — Visualise the landscape
# notebook cell 3
visualise_landscape(surgeon, layer=5, dim_reduction='umap', n_probes=500)
# → interactive 2D projection of basin centroids + probe trajectoriesYou can now see the basin structure of L5 and watch where text inputs settle. Concrete visualisation of the "cat" vs "mat" vs "code" basins that transformers don't give you.
Step 4 — Capture a concept and watch it land
# notebook cell 4
# Baseline
before_survey = surgeon.survey(layer=5)
# Capture "polite" from 3 examples
surgeon.capture(layer=5, concept='polite', text='Thank you for reaching out')
surgeon.capture(layer=5, concept='polite', text='I appreciate your question')
surgeon.capture(layer=5, concept='polite', text='It would be my pleasure to help')
# Inject at 0.1 strength
surgeon.inject_concept(layer=5, concept='polite', strength=0.1)
# After
after_survey = surgeon.survey(layer=5)
print(f'Before: {before_survey["num_basins"]} basins')
print(f'After: {after_survey["num_basins"]} basins (+{after_survey["num_basins"]-before_survey["num_basins"]})')
# Where in the landscape did the new basin land?
visualise_landscape(surgeon, layer=5, highlight_concept='polite')You literally see the new basin appear on the UMAP projection. You can measure its distance to neighbouring basins. You can check which existing basins' radii shrank to make room.
Step 5 — Probe interactions between basins
# notebook cell 5
# Which existing basins overlap with 'polite'?
neighbours = surgeon.find_basin_neighbours(layer=5, concept='polite', top_k=5)
for b in neighbours:
print(f'Basin {b["id"]} (cos {b["cosine_similarity"]:.3f}): probed with → "{b["example_probe"]}"')
# Expected output:
# Basin 142 (cos 0.61): probed with → 'greeting'
# Basin 89 (cos 0.54): probed with → 'acknowledgment'
# Basin 203 (cos 0.48): probed with → 'helpful'
# ...This tells you what 'polite' is near in the learned representation space — a direct measurement, not a SAE hypothesis. Compare across layers: is polite near formal in L7 but near greeting in L2?
Step 6 — Causal intervention
# notebook cell 6
# What happens if we zero out basin 142 (the "greeting" neighbour)?
trajectory_before = surgeon.trace_generation('Hi, can you help me?', layer=5)
surgeon.intervene(layer=5, basin_id=142, intervention='zero')
trajectory_after = surgeon.trace_generation('Hi, can you help me?', layer=5)
# Visualise: same input, different trajectories through the landscape
visualise_trajectory([trajectory_before, trajectory_after], layer=5)This is Pearl-style do(X) causal intervention on discrete model-internal variables. The operation is reversible (surgeon.revert()), so you can explore a tree of counterfactual model states.
Step 7 — Export experiment artefacts
Every state of the model during your investigation is addressable:
# notebook cell 7
experiment_record = {
'baseline_survey': before_survey,
'post_injection_survey': after_survey,
'injected_concept': surgeon.export_concept('polite'),
'neighbour_analysis': neighbours,
'causal_intervention': {
'zeroed_basin': 142,
'trajectory_before': trajectory_before.to_dict(),
'trajectory_after': trajectory_after.to_dict(),
},
'seed': 42,
'checkpoint_hash': surgeon.model_weights_hash(),
}
with open('experiment-2026-04-24.json', 'w') as f:
json.dump(experiment_record, f, indent=2)The experiment is now a fully reproducible artefact — anyone with the checkpoint hash can replay every operation deterministically.
Research directions this enables
Real questions that are hard on transformers and natural on HLM3:
- Concept composition: can we blend two orthogonal concepts (e.g.
formal+mathematical) and produce a basin that lives at the expected intermediate point? - Concept locality: is
politestored as a single basin or distributed across 2-3? Does it matter for downstream behaviour? - Cross-model portability: if I export
politefrom HLM3-large and import into HLM3-medium, does the injected basin end up in a "similar" location? How do you measure "similar" across different hidden-dim spaces? - Basin dynamics under training: track how individual basins move during fine-tuning. Do they drift smoothly or jump discretely? (Analogous to the "feature emergence" literature on transformers, but directly observable.)
- Causal graph discovery: run systematic do() interventions to build a causal graph over basins. Is the resulting graph stable across seeds / architectures?
Each is a publishable research direction. The Energy Language operations reference has the primitives you'd use for each.
Caveats & what to read next
- Research access is via waitlist: request access with your institutional affiliation.
- Published basin counts are seed- and dimension-dependent. Don't take a single number as the architecture's fingerprint — always cite N-seeds mean ± std.
- Interpretability research is sensitive to checkpoint identity. Different HLM3 checkpoints have different basin structures; an experiment on
hlm3-large-ffn-v1.4.2won't replicate onv1.5.0without retraining. - Surgery on research checkpoints can degrade capability. Always
verifyafter each operation if your downstream measurement depends on capability being preserved.