What We Demonstrated
The demo is built around a simple logistics workflow involving humanoid and dual-arm robots. Humanoid robots perform mobile manipulation tasks, including object retrieval and transportation between stations. Dual-arm robots operate at fixed workstations, where they perform precise manipulation tasks such as sorting, packing, and workflow completion.
The two robot embodiments serve complementary roles. Humanoid robots provide mobility and can interact with human-designed environments, while dual-arm robots offer higher precision and efficiency for structured manipulation tasks. Together, they form an end-to-end workflow that combines mobility and dexterous manipulation.
More importantly, all robots in the demo are powered by the same model trained exclusively on human-centric data, without any robot-collected training data. The model is not tied to a specific embodiment and can be deployed across both humanoid and dual-arm platforms. This demonstrates that human-centric data can support mobile manipulation across different robot bodies, enabling a unified intelligence system for heterogeneous robotic platforms.
One Model, Multiple Bodies
A common direction in robotics is to optimize for a single robot embodiment capable of solving a broad range of tasks. In practice, however, different embodiments offer fundamentally different trade-offs in mobility, dexterity, precision, payload, cost, and deployment constraints. As a result, real-world robotic systems are likely to consist of heterogeneous platforms rather than a single universal robot.
Humanoid robots are effective in human-centric environments that require mobility and interaction with existing infrastructure. Dual-arm robots excel at structured manipulation tasks that demand precision, repeatability, and efficiency. Mobile dual-arm platforms combine the advantages of both, enabling manipulation over larger workspaces.
This diversity of embodiments creates a scalability challenge for robot learning. In the conventional paradigm, each robot requires its own data, model, and training pipeline, making capability transfer across platforms difficult.
Our goal is to decouple intelligence from embodiment. We separate the system into a high-level intelligence layer and a low-level control layer. The high-level model performs scene understanding, task reasoning, and behavior generation, while the low-level controller handles execution and embodiment-specific dynamics.
This architecture allows the same intelligence framework to operate across different robot bodies. In this demo, a shared model is deployed on both humanoid and dual-arm robots, despite their different morphologies and control requirements.
The key question is therefore: what kind of data can train an intelligence system that generalizes across robot embodiments?
Human-Centric Data as a Foundation for Generalization
The scalability of modern AI systems has been driven by large-scale data. Language models learn from text, and vision models learn from images and videos. For embodied intelligence, the key question is what data source can support learning at a similar scale while transferring across diverse robot embodiments.
Robot-collected data provides valuable supervision but faces fundamental scaling limitations. Data collection is expensive, embodiment-specific, and tightly coupled to particular hardware configurations. As a result, transferring capabilities across robot platforms often requires collecting new datasets and training new models.
More fundamentally, robot-collected data often requires humans to adapt to the robot rather than demonstrate the task naturally. Operators must work around the robot's kinematics, sensing, latency, and control constraints, making data collection slower and less efficient. As a result, the collected behaviors often reflect cautious and constrained robot operation rather than the fluid and efficient motions humans would normally use. These limitations are particularly severe for humanoid robots, where large-scale data collection requires coordinating locomotion, balance, perception, and manipulation.
Human-centric data offers an alternative scaling path. Human demonstrations capture task structure independently of any particular robot embodiment and are available at significantly larger scale than robot-collected data. Rather than learning from how a specific robot executes a task, the model learns from how humans solve the task itself.
In this work, we train a unified model using only human-centric data and deploy it on both humanoid and dual-arm robots. The results demonstrate that human-centric data can serve as a common foundation for learning transferable robot intelligence across embodiments, without relying on any robot-collected training data.
From Human Behavior to Robot Execution
Human-centric data is abundant and scalable, but using it for robot learning is significantly more challenging than using robot-collected data. Robot data is already expressed in the robot’s action space and can be directly used for policy learning. Human data, in contrast, is generated by a body with different kinematics, dynamics, workspaces, motion patterns, and manipulation capabilities. Direct imitation is therefore not sufficient, and an explicit mechanism is required to bridge human behavior and robot execution.
Transforming human behavior into executable robot actions requires a system that enables direct learning from human-centric data. To achieve this, we introduce a set of components spanning representation alignment, whole-body control, execution compensation, and hierarchical inference, which together make it possible to train and deploy robot policies using human-centric data end-to-end.
- Cross-Embodiment Data Pipeline. Human-centric data and robot action spaces differ significantly in kinematics, dynamics, and embodiment structure. We introduce a Cross-Embodiment Data Pipeline that converts large-scale human demonstrations into robot-executable action representations across diverse embodiments, effectively transferring human dexterity into robotic behavior.
- Whole-Body Action Foundation Model. Humanoid mobile manipulation requires coordinated control of locomotion and manipulation under full-body constraints. We learn a whole-body motion control foundation model from tens of thousands of hours of motion capture data, providing a unified low-level execution interface for tracking diverse motions while maintaining balance and physical feasibility. Unlike teleoperation-based low-level controllers, which primarily emphasize spatial coverage, human-centric data-driven models require highly accurate low-level execution. To address this, we train a whole-body motion control model that achieves sub-3 cm end-effector tracking accuracy while maintaining global motion coherence and physical feasibility.
- Real-World Execution Compensation Model. A key challenge in real-world deployment is the sim-to-real gap, which makes controllers trained purely in simulation insufficient for reliable physical execution. To address this, we introduce a lightweight compensation model trained on a small amount of real robot deployment data. The model corrects tracking errors, dynamics mismatch, and embodiment-specific deviations, enabling robust real-world execution. On the Unitree G1 platform, where the arm precision is relatively limited, it achieves sub-1 cm manipulation accuracy.
- Hierarchical Coordination Reasoning. Robot execution suffers from perception and control latency at the low level, while human-centric data is inherently delay-free. This creates a fundamental mismatch: policies trained purely by imitating human demonstrations cannot account for real-world execution delays. To address this, we introduce a hierarchical coordination reasoning framework that enables cross-level interaction between high-level inference and low-level control. Instead of directly imitating human-centric data, the high-level policy reasons over real-time low-level feedback and adaptively decides when and how to invoke low-level skills, ensuring consistent execution under latency-constrained physical systems.
Together, these components define a system that maps human-centric demonstrations to executable robot behavior. The framework enables learning from human data without relying on robot-specific teleoperation datasets and supports deployment across heterogeneous robot embodiments.
What Comes Next
This demo represents an early stage of our system. We are scaling human-centric data, improving model capability, and extending deployment to additional robot embodiments, including mobile dual-arm robots and other platforms that combine mobility and manipulation.
As the model improves, we expect faster and more reliable execution, stronger performance on long-horizon tasks, and broader deployment across logistics, industrial, and commercial environments. Our goal is not to build a separate intelligence for every robot, but a shared intelligence that can operate across heterogeneous robot bodies.