Labs are clean, flat, and repeatable. Homes are not. Narrow hallways, uneven thresholds, soft rugs, pets, furniture that moves week to week—the distribution of real domestic spaces breaks assumptions that work beautifully on polished concrete floors. Over the past year we have been running humanoid mobility trials in actual residences to understand what “generalization” must mean outside simulation.
Our focus in this program was not choreographed demos. We cared about sustained traversal: entering a home, moving between rooms, stopping safely near people, and recovering when footing was uncertain. Each session generated long-horizon logs of proprioception, depth, and operator interventions, which we use both for training and for honest failure analysis.
What surprised us
Small geometry dominates. A two-centimeter door saddle or a cable routed along a baseboard produced more downtime than large-scale planning mistakes. Perception errors mattered, but often the bottleneck was contact-rich balance: knowing when to shorten stride, when to pause, and when to ask for help rather than commit to a risky step.
We also saw how much human context changes the task. The same floor plan behaves differently with children, guests, or objects left in unexpected places. Static maps are insufficient; mobility in homes is a social and temporal problem, not only a geometric one.
Implications for our stack
These field runs pushed us toward policies that reason over short horizons with strong fallbacks, logging infrastructure that makes rare events visible, and evaluation metrics that reward reliability over peak performance. We are sharing this note as a snapshot of that learning process—not as a claim that the problem is solved.
We will continue publishing results as our systems mature. If you are exploring humanoid deployment, domestic robotics, or large-scale embodied datasets, reach out—we are actively collaborating with teams who care about the messy middle between demo and product.