Skip to content

HLM-Audio

HLM-Audio applies polynomial Hopfield networks to speech processing. Basin surgery enables direct editing of how the model processes audio patterns — every basin is a spectral or temporal pattern the model has learned.

Status

The validated demo path is HLM-TTS-Mix on LJSpeech with val_loss 0.3845. Speech recognition is in validation and is not positioned as a public headline benchmark yet.

Capabilities

ModelTaskMetric
STT (Speech-to-Text)Transcription with editable acoustic modelsIn validation
TTS-Mix (Text-to-Speech)Synthesis with programmable voice characteristicsval_loss 0.3845

How Audio Basins Work

In a language model, basins represent semantic patterns. In an audio model, basins represent spectral and temporal patterns: phonemes, prosody, speaker characteristics, and acoustic features.

Layer depthWhat basins represent
Early layers (0–2)Spectral primitives: frequency bands, onset detection, harmonics
Middle layers (3–4)Phoneme-level patterns: vowels, consonants, transitions
Deep layers (5+)High-level features: words, prosody, speaker identity

Surgery on Audio Models

STT — Editing Speech Recognition

Modify how the model transcribes speech:

python
from qriton_hlm import BasinSurgeon

surgeon = BasinSurgeon.from_checkpoint("hlm-audio-stt.pt", device="cuda")

# Survey acoustic basins at the phoneme level
survey = surgeon.survey(layer=3)
print(f"{survey['num_basins']} acoustic basins found")

# Remove a pattern causing misrecognition
surgeon.remove(layer=3, seed=12, strength=0.1)

# Inject a new acoustic pattern for a specific phoneme
surgeon.inject(layer=3, seed=33, strength=0.1)

# Verify and apply
v = surgeon.verify(layer=3, seed=33)
print(f"Basin created: {v['is_basin']}  cos={v['cos']:.4f}")

surgeon.apply(layer=3)
result = surgeon.benchmark()
print(f"WER after surgery: {result['perplexity']:.2f}")

TTS — Programming Voice Characteristics

Edit how the model synthesizes speech:

python
surgeon = BasinSurgeon.from_checkpoint("hlm-audio-tts.pt", device="cuda")

# Survey voice characteristic basins
survey = surgeon.survey(layer=5)

# Strengthen a prosody pattern for more natural intonation
surgeon.strengthen(layer=5, seed=7, factor=1.5)

# Weaken an unwanted artifact pattern
surgeon.weaken(layer=5, seed=19, factor=0.5)

# Use causal analysis to find which patterns shape voice quality
graph = surgeon.causal_scan(layer=5, threshold=0.15)
for edge in graph['edges']:
    print(f"  B{edge['source']} -> B{edge['target']}  drift={edge['drift']:.3f}")

surgeon.apply(layer=5)

Cross-Modal Comparison

Compare what language and audio models have learned:

python
language = BasinSurgeon.from_checkpoint("hlm3-model.pt")
audio = BasinSurgeon.from_checkpoint("hlm-audio-stt.pt")

# Same architecture, different modalities — compare basin landscapes
diff = language.compare(audio, layer=3)
print(f"Shared basins: {diff['shared']}")
print(f"Language-only: {diff['only_self']}")
print(f"Audio-only: {diff['only_other']}")

HLM Script — Audio Audit

bash
# audio_audit.hlm
load hlm-audio-stt.pt
info
survey-all
guard max-basins 200
guard min-basins 5

# Inspect phoneme-level layer
survey 3
landscape 3
causal scan 3 0.15

# Test an edit
inject 3 33 0.1
verify 3 33
diff 3
benchmark
restore 3

history

Run with: qriton-hlm --script audio_audit.hlm

Custom Training & Pilots

For speech and audio applications requiring custom model training, contact us.

We support:

  • Custom training on proprietary audio datasets (domain-specific vocabulary, accents, languages)
  • Voice customization — program specific voice characteristics via basin surgery
  • Pilot projects for STT/TTS applications with hands-on engineering support