Skip to content

Robotics and Object Recognition

Object recognition with the HLM-Spatial stack, plus an embodied action path: simulated pick-and-place, Kinect live recognition, and optional physical arm integration.

Status

Demo-ready menu. Laptop-safe mode is PyBullet simulation; Kinect and physical arm modes require connected hardware and drivers.

What the demo proves

The shared Hopfield core handles 3D and embodied perception as well as it handles language and 2D vision:

  • Object recognition from point-cloud frontends
  • Sim pick-and-place in a PyBullet Franka Panda scene (no hardware needed)
  • Live Kinect classification with RGB and depth fusion
  • OpenCV Kinect visual classifier for a color/depth/object overlay window
  • Physical arm hooks — myCobot driver, servo arm server, camera-to-base calibration

Available modes

ModeHardwareSurface
Simulated pick-and-placenonePyBullet scene; laptop-safe default
Cross-modal object/action webnoneGradio on port 7860
Kinect live recognitionAzure KinectWeb on port 7863
OpenCV Kinect visualAzure KinectNative window
Physical arm hooksmyCobot / servo armDriver + server scripts

What the reviewer sees

  • A PyBullet scene with simulated pick-and-place actions
  • A Gradio UI for cross-modal object/action exploration
  • Optional live RGB+depth classification when a Kinect is attached
  • Documented hooks for connecting a physical robot arm

Why a robotics surface matters

Modality breadth is the moat. Showing the same Hopfield core handle unordered 3D point sets and embodied actions makes the architecture story concrete in a way that a language-only demo cannot.

Caveats

  • Live Kinect and physical-arm paths require connected hardware. The laptop-default path is the simulation.
  • Sim demos use a small modelnet10-derived checkpoint; this is a doctrine result, not a SOTA claim.

Where it fits

The right demo for:

  • Engineering reviewers asking "does this work on real 3D data and not just text?"
  • Robotics or embodied-AI partners evaluating perception + action coverage
  • Buyers wanting to see the multimodal story end to end