HLM-Audio
HLM-Audio applies polynomial Hopfield networks to speech processing. Basin surgery enables direct editing of how the model processes audio patterns — every basin is a spectral or temporal pattern the model has learned.
Status
The validated demo path is HLM-TTS-Mix on LJSpeech with val_loss 0.3845. Speech recognition is in validation and is not positioned as a public headline benchmark yet.
Capabilities
| Model | Task | Metric |
|---|---|---|
| STT (Speech-to-Text) | Transcription with editable acoustic models | In validation |
| TTS-Mix (Text-to-Speech) | Synthesis with programmable voice characteristics | val_loss 0.3845 |
How Audio Basins Work
In a language model, basins represent semantic patterns. In an audio model, basins represent spectral and temporal patterns: phonemes, prosody, speaker characteristics, and acoustic features.
| Layer depth | What basins represent |
|---|---|
| Early layers (0–2) | Spectral primitives: frequency bands, onset detection, harmonics |
| Middle layers (3–4) | Phoneme-level patterns: vowels, consonants, transitions |
| Deep layers (5+) | High-level features: words, prosody, speaker identity |
Surgery on Audio Models
STT — Editing Speech Recognition
Modify how the model transcribes speech:
from qriton_hlm import BasinSurgeon
surgeon = BasinSurgeon.from_checkpoint("hlm-audio-stt.pt", device="cuda")
# Survey acoustic basins at the phoneme level
survey = surgeon.survey(layer=3)
print(f"{survey['num_basins']} acoustic basins found")
# Remove a pattern causing misrecognition
surgeon.remove(layer=3, seed=12, strength=0.1)
# Inject a new acoustic pattern for a specific phoneme
surgeon.inject(layer=3, seed=33, strength=0.1)
# Verify and apply
v = surgeon.verify(layer=3, seed=33)
print(f"Basin created: {v['is_basin']} cos={v['cos']:.4f}")
surgeon.apply(layer=3)
result = surgeon.benchmark()
print(f"WER after surgery: {result['perplexity']:.2f}")TTS — Programming Voice Characteristics
Edit how the model synthesizes speech:
surgeon = BasinSurgeon.from_checkpoint("hlm-audio-tts.pt", device="cuda")
# Survey voice characteristic basins
survey = surgeon.survey(layer=5)
# Strengthen a prosody pattern for more natural intonation
surgeon.strengthen(layer=5, seed=7, factor=1.5)
# Weaken an unwanted artifact pattern
surgeon.weaken(layer=5, seed=19, factor=0.5)
# Use causal analysis to find which patterns shape voice quality
graph = surgeon.causal_scan(layer=5, threshold=0.15)
for edge in graph['edges']:
print(f" B{edge['source']} -> B{edge['target']} drift={edge['drift']:.3f}")
surgeon.apply(layer=5)Cross-Modal Comparison
Compare what language and audio models have learned:
language = BasinSurgeon.from_checkpoint("hlm3-model.pt")
audio = BasinSurgeon.from_checkpoint("hlm-audio-stt.pt")
# Same architecture, different modalities — compare basin landscapes
diff = language.compare(audio, layer=3)
print(f"Shared basins: {diff['shared']}")
print(f"Language-only: {diff['only_self']}")
print(f"Audio-only: {diff['only_other']}")HLM Script — Audio Audit
# audio_audit.hlm
load hlm-audio-stt.pt
info
survey-all
guard max-basins 200
guard min-basins 5
# Inspect phoneme-level layer
survey 3
landscape 3
causal scan 3 0.15
# Test an edit
inject 3 33 0.1
verify 3 33
diff 3
benchmark
restore 3
historyRun with: qriton-hlm --script audio_audit.hlm
Custom Training & Pilots
For speech and audio applications requiring custom model training, contact us.
We support:
- Custom training on proprietary audio datasets (domain-specific vocabulary, accents, languages)
- Voice customization — program specific voice characteristics via basin surgery
- Pilot projects for STT/TTS applications with hands-on engineering support