
A developer attached a physical LeRobot 101 robot arm to the OpenClaw coding agent and used Codex to bring the hardware online, calibrate joints, and guide training runs that enabled the arm to see and slowly grasp objects. The work moved the system beyond simulation into a modest real‑world testbed, demonstrating how code‑centric agents can translate teleoperation data into physical manipulation. This matters for builders aiming to move prototypes into real deployments: the experiment shows practical gaps and engineering effort remain even when models automate much of the software work.
The LeRobot 101 kit, promoted on HuggingFace, includes two arms: a human‑operated controller arm with a handle and trigger, and a follower arm outfitted with a camera that mirrors movements. The developer used the follower’s camera as the agent’s visual input and teleoperated the controller to generate demonstration data; that paired camera observations with follower motions to train a model that maps vision to action.
Bringing the hardware online still demanded hands‑on debugging. The author spent several hours connecting and calibrating cables and joints and at one point nearly overheated the motors by applying incorrect settings. With OpenClaw and Codex’s assistance the team wrote Python code that chained multiple libraries to detect a red ball and trigger the claw’s gripper when the object appeared. The process required iterative “vibe‑coding” sessions and fixing hallucination‑driven bugs tied to specific hardware quirks.
After experimenting with different training approaches, the agent helped evaluate error rates between runs and steer further training. The resulting setup could pick up and place objects and reproduce a simple “wave” motion, showing that a workflow combining teleoperation data, model training, and iterative evaluation can replace hand‑crafted controllers for some tasks while still requiring careful tuning.
This hands‑on experiment sits inside a broader trend called “code as policy,” first highlighted in a 2022 paper. Ken Goldberg’s research group, working with teams at NVIDIA, Carnegie Mellon University and Stanford, developed the CaP‑X benchmark to measure coding models’ robot programming ability and released related tooling: CaP — Gym, an environment for controlling simulated and real robots, and CaP — Agent0, an agentic framework designed to boost coding models on manipulation tasks.
CaP‑X results reported that Google’s Gemini outperformed Claude and ChatGPT on these programming tasks, suggesting some multimodal models currently translate code to physical actions more effectively. Researchers at Berkeley and NVIDIA are pursuing compatibility and scalability work: NVIDIA has run internal vibe‑coding hackathons, and Spencer Huang is collaborating with Goldberg to broaden hardware support. Together these efforts underscore both the promise of code‑driven robot programming and the remaining engineering work required to turn prototype rigs into robust deployments.
Sources
Replies (0)
No replies in this topic yet.