Skip to content

Use Case: Programmable AI Assistant Persona

Design a chatbot assistant whose tone / style / refusal behaviour is edited via Energy Language surgery instead of retraining or prompt engineering. Ship personality changes in milliseconds without touching the weights you didn't mean to touch.

Who this is for

  • Chatbot product leads whose deployed LLM needs a persona update (more formal / more supportive / less verbose) without a three-week fine-tuning cycle
  • LLM-app developers stuck between "fine-tune the whole model" and "stuff everything into the system prompt"
  • Brand / content teams who want to capture "what does our brand voice look like?" as a concept and inject it into their production model

The problem you're solving

The standard toolkit for persona control has three failure modes:

  1. Prompt engineering: brittle. A new system prompt that makes the assistant "formal" also subtly changes how it handles math, coding, refusals. Small prompt edits produce large behavioural shifts in surprising ways.
  2. Fine-tuning: expensive, slow, and global. You retrained to be more polite — now it's also worse at SQL. No way to localise the edit.
  3. LoRA adapters: cheaper but still black-box. You can't tell which behaviours the adapter altered without extensive evaluation, and you can't mix two personas at 50/50 without retraining.

Basin surgery fixes these. You capture the abstract concept of "polite" from a handful of examples, inject it as a discrete attractor in the energy landscape, blend at any ratio, export for reuse, remove if you don't like it. Every operation is addressable, reversible, and auditable.

What you'll build

A working chatbot where you can:

  1. Capture "polite" from 3–5 examples
  2. Inject it as a new basin at 10% strength (subtle) or 50% (strong)
  3. Blend with "technical" at a 70/30 mix for a formal-but-technical assistant
  4. Export the persona and import it into a different HLM3 checkpoint
  5. Revert instantly if the edit made something else worse

All edits happen in milliseconds on a laptop.

The stack

PieceSource
HLM3 (pre-trained language model)HLM3 — waitlist for weights
HLM-Audio (optional voice output)HLM-Audio
Energy Language CLI or Python APIqriton-hlm package
Your application layerFlask / FastAPI / whatever — the API surface is standard Python

Walkthrough

Step 1 — Load a checkpoint and survey

bash
$ qriton-hlm -c hlm3-large-ffn.pt

hlm:hlm3-large-ffn> survey 5
  Layer 5: 47 basins found (200 inits, β=7.00)

hlm:hlm3-large-ffn> generate Hello, how can I help you today?
  Hi, I can help with that. What do you need?

Baseline response — neutral tone.

Step 2 — Capture the "polite" concept from examples

hlm:hlm3-large-ffn> capture 5 polite Thank you so much for your question, I'd be happy to help
  Captured L5 → concept 'polite' (1 samples)
  Energy: -12.34 | Basin: True (cos=0.97, 23 iters)

hlm:hlm3-large-ffn> capture 5 polite I truly appreciate you reaching out. Let me assist with that
  Captured L5 → concept 'polite' (2 samples; averaged)

hlm:hlm3-large-ffn> capture 5 polite Absolutely, it would be my pleasure to help you with that
  Captured L5 → concept 'polite' (3 samples; averaged)

The capture operation averages the settled states of each example in the landscape — it's not memorising the strings, it's finding the region of the energy landscape these texts collectively live in.

Step 3 — Inject as a new basin at controlled strength

hlm:hlm3-large-ffn> inject-concept 5 polite 0.1
  Before: 47 basins, concept is basin: False
  After:  48 basins (+1), concept is basin: True  (strength=0.1)
  >> Concept successfully injected!

hlm:hlm3-large-ffn> apply 5
  Layer 5 changes committed to model state.

hlm:hlm3-large-ffn> generate Hello, how can I help you today?
  Hi there — I'd be delighted to help. What do you have in mind?

The tone shifted. At strength 0.1 the edit is subtle; try 0.3 for strong, 0.5 for dominant.

Step 4 — Verify & benchmark

hlm:hlm3-large-ffn> verify 5
  Basin count: 48 (was 47) ✓
  Other basins unchanged: 47/47 ✓
  Output coherence check: 100 samples generated, 100 parsed ✓
  Perplexity delta: +0.03 (baseline 48.3, now 48.33) — within noise ✓

The 5 verification checks that surgery didn't break other parts of the model. Perplexity delta is your canary for "did this edit cost me capability somewhere else?"

Step 5 — Blend personas

hlm:hlm3-large-ffn> capture 5 technical The optimisation uses stochastic gradient descent
hlm:hlm3-large-ffn> capture 5 technical The dependency graph shows three unresolved cycles
hlm:hlm3-large-ffn> capture 5 technical This class inherits from BaseRepository and overrides save()

hlm:hlm3-large-ffn> blend 5 polite technical 0.7 0.3 --as formal_technical
  Blended L5: polite (0.7) + technical (0.3) → formal_technical
  New concept: formal_technical (3 basins)

hlm:hlm3-large-ffn> inject-concept 5 formal_technical 0.2
hlm:hlm3-large-ffn> apply 5

hlm:hlm3-large-ffn> generate Explain how quicksort works
  I'd be glad to walk you through it. Quicksort is a divide-and-conquer 
  algorithm that operates in O(n log n) on average...

That's formal tone + technical content, programmatically combined. No retraining, no prompt engineering, instant.

Step 6 — Export the persona for reuse

hlm:hlm3-large-ffn> export-concept formal_technical ./personas/formal_technical.qcon
  Exported: 1 basin, 3.2 KB, SHA-256: a1b2c3...

hlm:hlm3-large-ffn> exit

# In a different HLM3 checkpoint:
$ qriton-hlm -c hlm3-other-checkpoint.pt

hlm:hlm3-other> import-concept 5 ./personas/formal_technical.qcon
  Imported formal_technical (3 basins) into L5.
  Compatibility check: ✓ same architecture, ✓ compatible dim

hlm:hlm3-other> inject-concept 5 formal_technical 0.2
hlm:hlm3-other> apply 5
hlm:hlm3-other> generate Explain recursion
  [generates formal_technical-toned response on a different model]

Concepts are portable across HLM3 checkpoints. Train your brand voice once, deploy it across production + staging + per-customer variants.

Step 7 — Revert if needed

hlm:hlm3-other> undo
  Reverted 1 operation: inject-concept 5 formal_technical 0.2

hlm:hlm3-other> undo
  Reverted 1 operation: import-concept 5 formal_technical

Surgery is reversible per-operation. A bad edit never has to stick.

Step 8 — Integrate with your application

python
from qriton_hlm import BasinSurgeon
from fastapi import FastAPI

app = FastAPI()

surgeon = BasinSurgeon.from_checkpoint('hlm3-large-ffn.pt')
# Apply the production persona once at startup
surgeon.import_concept(5, './personas/formal_technical.qcon')
surgeon.inject_concept(5, 'formal_technical', strength=0.2)
surgeon.apply(5)

@app.post('/chat')
async def chat(req: ChatRequest):
    return {'reply': surgeon.generate(req.prompt)}

The persona is now baked into the loaded model for the lifetime of the process. Swap personas at runtime with surgeon.revert() + import_concept(other_persona).

  • Basin surgery at strength > 0.3 can degrade other capabilities. Use verify after every injection to check perplexity drift.
  • Concept capture quality depends on examples. 3-5 diverse examples is usually enough; 1 example captures the specific sample, not the concept.
  • Portability requires matching architecture. A concept exported from hlm3-large-ffn does NOT import cleanly into hlm3-medium (different hidden dims). v1 cross-architecture import is on the roadmap.
  • This is not RLHF. RLHF shifts the entire model's preference distribution via training. Basin surgery edits specific concepts addressably. Use the right tool for the right job.