Skip to content

Concept Surgery

Capture what a concept looks like inside a model, then inject it as a new attractor basin.

Prerequisites

This tutorial requires a full HLM3 model checkpoint (not just W matrices) for the capture step.

Capture

Run text through the model and extract the converged Hopfield state:

python
surgeon = BasinSurgeon.from_checkpoint("hlm3-model.pt", device="cuda")

surgeon.capture(layer=5, text="Thank you so much", concept_name="polite")
surgeon.capture(layer=5, text="I really appreciate it", concept_name="polite")
surgeon.capture(layer=5, text="That's very kind of you", concept_name="polite")

Each capture adds a sample. The concept centroid is the average of all converged states.

Inject

Program the captured concept as a new attractor:

python
result = surgeon.inject_concept(layer=5, concept_name="polite", strength=0.1)
print(f"Basin created: {result['exists_after']}")
print(f"Basin count: {result['basins_before']}{result['basins_after']}")

Blend

Mix two concepts at any ratio:

python
surgeon.capture(layer=5, text="The algorithm is O(n log n)", concept_name="technical")
surgeon.capture(layer=5, text="The gradient descent converges", concept_name="technical")

# 60% polite, 40% technical
surgeon.blend("polite", "technical", "professional", ratio=0.6)
surgeon.inject_concept(layer=5, concept_name="professional", strength=0.1)

Export and Import

Make concepts portable:

python
surgeon.export_concept("polite", "polite.concept")

# On another machine or model
surgeon.import_concept("polite.concept")

Transplant

Copy a concept from one model to another:

python
source = BasinSurgeon.from_checkpoint("model_a.pt")
target = BasinSurgeon.from_checkpoint("model_b.pt")

source.capture(layer=5, text="Thank you", concept_name="polite")
target.transplant(source, layer=5, concept_name="polite")

Both models must have the same d_model dimension.

Apply and Verify

python
surgeon.apply(layer=5)

# Check perplexity impact
result = surgeon.benchmark()
print(f"Perplexity: {result['perplexity']:.2f}")

# Generate to see the effect
surgeon.generate("Tell me about the weather")