Use Case: Programmable AI Assistant Persona
Design a chatbot assistant whose tone / style / refusal behaviour is edited via Energy Language surgery instead of retraining or prompt engineering. Ship personality changes in milliseconds without touching the weights you didn't mean to touch.
Who this is for
- Chatbot product leads whose deployed LLM needs a persona update (more formal / more supportive / less verbose) without a three-week fine-tuning cycle
- LLM-app developers stuck between "fine-tune the whole model" and "stuff everything into the system prompt"
- Brand / content teams who want to capture "what does our brand voice look like?" as a concept and inject it into their production model
The problem you're solving
The standard toolkit for persona control has three failure modes:
- Prompt engineering: brittle. A new system prompt that makes the assistant "formal" also subtly changes how it handles math, coding, refusals. Small prompt edits produce large behavioural shifts in surprising ways.
- Fine-tuning: expensive, slow, and global. You retrained to be more polite — now it's also worse at SQL. No way to localise the edit.
- LoRA adapters: cheaper but still black-box. You can't tell which behaviours the adapter altered without extensive evaluation, and you can't mix two personas at 50/50 without retraining.
Basin surgery fixes these. You capture the abstract concept of "polite" from a handful of examples, inject it as a discrete attractor in the energy landscape, blend at any ratio, export for reuse, remove if you don't like it. Every operation is addressable, reversible, and auditable.
What you'll build
A working chatbot where you can:
- Capture "polite" from 3–5 examples
- Inject it as a new basin at 10% strength (subtle) or 50% (strong)
- Blend with "technical" at a 70/30 mix for a formal-but-technical assistant
- Export the persona and import it into a different HLM3 checkpoint
- Revert instantly if the edit made something else worse
All edits happen in milliseconds on a laptop.
The stack
| Piece | Source |
|---|---|
| HLM3 (pre-trained language model) | HLM3 — waitlist for weights |
| HLM-Audio (optional voice output) | HLM-Audio |
| Energy Language CLI or Python API | qriton-hlm package |
| Your application layer | Flask / FastAPI / whatever — the API surface is standard Python |
Walkthrough
Step 1 — Load a checkpoint and survey
$ qriton-hlm -c hlm3-large-ffn.pt
hlm:hlm3-large-ffn> survey 5
Layer 5: 47 basins found (200 inits, β=7.00)
hlm:hlm3-large-ffn> generate Hello, how can I help you today?
Hi, I can help with that. What do you need?Baseline response — neutral tone.
Step 2 — Capture the "polite" concept from examples
hlm:hlm3-large-ffn> capture 5 polite Thank you so much for your question, I'd be happy to help
Captured L5 → concept 'polite' (1 samples)
Energy: -12.34 | Basin: True (cos=0.97, 23 iters)
hlm:hlm3-large-ffn> capture 5 polite I truly appreciate you reaching out. Let me assist with that
Captured L5 → concept 'polite' (2 samples; averaged)
hlm:hlm3-large-ffn> capture 5 polite Absolutely, it would be my pleasure to help you with that
Captured L5 → concept 'polite' (3 samples; averaged)The capture operation averages the settled states of each example in the landscape — it's not memorising the strings, it's finding the region of the energy landscape these texts collectively live in.
Step 3 — Inject as a new basin at controlled strength
hlm:hlm3-large-ffn> inject-concept 5 polite 0.1
Before: 47 basins, concept is basin: False
After: 48 basins (+1), concept is basin: True (strength=0.1)
>> Concept successfully injected!
hlm:hlm3-large-ffn> apply 5
Layer 5 changes committed to model state.
hlm:hlm3-large-ffn> generate Hello, how can I help you today?
Hi there — I'd be delighted to help. What do you have in mind?The tone shifted. At strength 0.1 the edit is subtle; try 0.3 for strong, 0.5 for dominant.
Step 4 — Verify & benchmark
hlm:hlm3-large-ffn> verify 5
Basin count: 48 (was 47) ✓
Other basins unchanged: 47/47 ✓
Output coherence check: 100 samples generated, 100 parsed ✓
Perplexity delta: +0.03 (baseline 48.3, now 48.33) — within noise ✓The 5 verification checks that surgery didn't break other parts of the model. Perplexity delta is your canary for "did this edit cost me capability somewhere else?"
Step 5 — Blend personas
hlm:hlm3-large-ffn> capture 5 technical The optimisation uses stochastic gradient descent
hlm:hlm3-large-ffn> capture 5 technical The dependency graph shows three unresolved cycles
hlm:hlm3-large-ffn> capture 5 technical This class inherits from BaseRepository and overrides save()
hlm:hlm3-large-ffn> blend 5 polite technical 0.7 0.3 --as formal_technical
Blended L5: polite (0.7) + technical (0.3) → formal_technical
New concept: formal_technical (3 basins)
hlm:hlm3-large-ffn> inject-concept 5 formal_technical 0.2
hlm:hlm3-large-ffn> apply 5
hlm:hlm3-large-ffn> generate Explain how quicksort works
I'd be glad to walk you through it. Quicksort is a divide-and-conquer
algorithm that operates in O(n log n) on average...That's formal tone + technical content, programmatically combined. No retraining, no prompt engineering, instant.
Step 6 — Export the persona for reuse
hlm:hlm3-large-ffn> export-concept formal_technical ./personas/formal_technical.qcon
Exported: 1 basin, 3.2 KB, SHA-256: a1b2c3...
hlm:hlm3-large-ffn> exit
# In a different HLM3 checkpoint:
$ qriton-hlm -c hlm3-other-checkpoint.pt
hlm:hlm3-other> import-concept 5 ./personas/formal_technical.qcon
Imported formal_technical (3 basins) into L5.
Compatibility check: ✓ same architecture, ✓ compatible dim
hlm:hlm3-other> inject-concept 5 formal_technical 0.2
hlm:hlm3-other> apply 5
hlm:hlm3-other> generate Explain recursion
[generates formal_technical-toned response on a different model]Concepts are portable across HLM3 checkpoints. Train your brand voice once, deploy it across production + staging + per-customer variants.
Step 7 — Revert if needed
hlm:hlm3-other> undo
Reverted 1 operation: inject-concept 5 formal_technical 0.2
hlm:hlm3-other> undo
Reverted 1 operation: import-concept 5 formal_technicalSurgery is reversible per-operation. A bad edit never has to stick.
Step 8 — Integrate with your application
from qriton_hlm import BasinSurgeon
from fastapi import FastAPI
app = FastAPI()
surgeon = BasinSurgeon.from_checkpoint('hlm3-large-ffn.pt')
# Apply the production persona once at startup
surgeon.import_concept(5, './personas/formal_technical.qcon')
surgeon.inject_concept(5, 'formal_technical', strength=0.2)
surgeon.apply(5)
@app.post('/chat')
async def chat(req: ChatRequest):
return {'reply': surgeon.generate(req.prompt)}The persona is now baked into the loaded model for the lifetime of the process. Swap personas at runtime with surgeon.revert() + import_concept(other_persona).
Caveats & what to read next
- Basin surgery at strength > 0.3 can degrade other capabilities. Use
verifyafter every injection to check perplexity drift. - Concept capture quality depends on examples. 3-5 diverse examples is usually enough; 1 example captures the specific sample, not the concept.
- Portability requires matching architecture. A concept exported from
hlm3-large-ffndoes NOT import cleanly intohlm3-medium(different hidden dims). v1 cross-architecture import is on the roadmap. - This is not RLHF. RLHF shifts the entire model's preference distribution via training. Basin surgery edits specific concepts addressably. Use the right tool for the right job.
Related
- Getting Started guide — if you haven't installed
qriton-hlmyet - First Surgery walkthrough — the guided tutorial version of this page
- Energy Language operations reference — all 36 commands
- HLM3 model page
- Tutorials → Custom Persona — the older detailed tutorial on this topic