Robotics and Object Recognition
Object recognition with the HLM-Spatial stack, plus an embodied action path: simulated pick-and-place, Kinect live recognition, and optional physical arm integration.
Status
Demo-ready menu. Laptop-safe mode is PyBullet simulation; Kinect and physical arm modes require connected hardware and drivers.
What the demo proves
The shared Hopfield core handles 3D and embodied perception as well as it handles language and 2D vision:
- Object recognition from point-cloud frontends
- Sim pick-and-place in a PyBullet Franka Panda scene (no hardware needed)
- Live Kinect classification with RGB and depth fusion
- OpenCV Kinect visual classifier for a color/depth/object overlay window
- Physical arm hooks — myCobot driver, servo arm server, camera-to-base calibration
Available modes
| Mode | Hardware | Surface |
|---|---|---|
| Simulated pick-and-place | none | PyBullet scene; laptop-safe default |
| Cross-modal object/action web | none | Gradio on port 7860 |
| Kinect live recognition | Azure Kinect | Web on port 7863 |
| OpenCV Kinect visual | Azure Kinect | Native window |
| Physical arm hooks | myCobot / servo arm | Driver + server scripts |
What the reviewer sees
- A PyBullet scene with simulated pick-and-place actions
- A Gradio UI for cross-modal object/action exploration
- Optional live RGB+depth classification when a Kinect is attached
- Documented hooks for connecting a physical robot arm
Why a robotics surface matters
Modality breadth is the moat. Showing the same Hopfield core handle unordered 3D point sets and embodied actions makes the architecture story concrete in a way that a language-only demo cannot.
Caveats
- Live Kinect and physical-arm paths require connected hardware. The laptop-default path is the simulation.
- Sim demos use a small modelnet10-derived checkpoint; this is a doctrine result, not a SOTA claim.
Where it fits
The right demo for:
- Engineering reviewers asking "does this work on real 3D data and not just text?"
- Robotics or embodied-AI partners evaluating perception + action coverage
- Buyers wanting to see the multimodal story end to end
Related
- HLM-Spatial model card — architecture and validation context
- Spatial 3D Web — point-cloud browser demo
- Demos overview — full demo gallery