Critical Document Audit
Paste or upload a policy, contract, incident report, or medical note. Extract findings, then audit the evidence spans with HLM3-Mix signals.
Status
Demo-ready browser workflow. The default checkpoint is the validated HLM3-Mix 35M K=16 (val PPL 10.66); the all-models edition exposes the bundle's broader language checkpoint catalog as selectable audit models.
What the demo proves
The same Hopfield core that drives language generation produces span-level audit signals over a document:
- Per-finding PPL, entropy, and prefix delta for each evidence span
- A document hash so the audit can be replayed
- A clear fragile / review / stable status per finding
- Saved JSON and Markdown reports for downstream review
What the reviewer sees
- A profile selector (contract / policy / incident / medical)
- The original document with findings extracted from deterministic rules
- Each finding's evidence span audited by HLM3-Mix
- A downloadable JSON of the full audit run
Two editions
| Edition | Models available | Use it for |
|---|---|---|
| Default | HLM3-Mix 35M K=16 only | Reliable client demo with the validated checkpoint |
| All-models | The full packaged checkpoint catalog | Side-by-side audit behavior across the bundle |
The all-models edition uses the same audit pipeline; it adds a sidebar checkpoint selector with the tier badge and caveat for each option. Non-validated checkpoints surface a clear warning.
What "all-models" reveals
Selecting different checkpoints lets a reviewer see how the audit signals shift with the underlying model. The validated checkpoint is the quality reference; other entries in the bundle are diagnostic surfaces with explicit in-UI caveats so a reviewer cannot mistake an experimental artifact for a quality claim.
Caveats
- The deterministic rules cover four profiles; broader risk schemas need a partner integration.
- Larger checkpoints in the bundle can need substantial VRAM. The UI surfaces this and offers a CPU fallback.
Where it fits
The right demo for compliance, legal, clinical, or operations reviewers who care about evidence traceability over single-answer generation.
Related
- Critical AI Audit Showcase — automated multi-document variant with perturbation comparison
- Use case: EU AI Act compliance — narrative deployment for regulated decisioning
- Validation Summary — public benchmark numbers
- HLM3-Mix Model Lab — prompt-test the same catalog