Gasgoo Munich- How big is the data gap for embodied intelligence? A simple grasping motion might require hundreds or even thousands of real-world demonstrations. Yet globally, the stock of truly usable, high-quality real-machine data totals only a few hundred thousand hours. The shortfall is measured in the tens of millions—or even billions.
How do we fill the gap?
The latest answer from tech giants: mobilize the masses and launch a crowdsourcing drive.
On June 10, Daimon Robotics and China Mobile announced a major partnership. The two will leverage China Mobile's vast network of hundreds of thousands of offline stores nationwide to build a distributed data collection network.
The first pilot base has already been established in Chenzhou, Hunan. Positioned as the world's first "5S store" for embodied data collection, it is set to begin regular operations on July 15.
This collaboration between Daimon and China Mobile targets the core bottleneck holding back embodied intelligence today. The extreme scarcity of high-quality real-machine data has become the Achilles' heel in the march toward the era of general-purpose robots.
Big Tech Turns to Crowdsourcing to Tackle the Data Bottleneck
Consider the contrast.
Training large language models (LLMs) means voraciously consuming existing text from the internet. Feeding them trillions of tokens is relatively straightforward.
Robots are different. They need to learn physical manipulation—how to grab a water glass, fold clothes, or pick up a spoon in a messy kitchen. This kind of data is virtually non-existent online.

Image Source: Daimon Robotics
What about collecting it fresh? The traditional method involves professional engineers demonstrating actions repeatedly in a lab using teleoperation devices. It's costly—requiring hundreds or thousands of repetitions for a single motion—and the settings are monotonous, with fixed lighting and neatly arranged objects.
Models trained on this data often stumble when placed in a new environment.
Industry estimates suggest the current scale of high-quality real-world data in embodied intelligence is only about 500,000 hours. Yet reaching a delivery-ready standard for just one skill requires 2,000 to 5,000 hours of training data. The scale of the shortfall is clear.
Against this backdrop, Daimon and China Mobile's initiative to bring "data collection into households" is a welcome move.
According to the plan, the Chenzhou 5S store will integrate exhibition, data collection training, equipment supply, pre-sales and after-sales service, and data-model-scenario collaboration. After short-term training, ordinary citizens can don two-finger grippers, tactile gloves, and head-mounted cameras to become data collectors across five major scenarios, including home, logistics, and manufacturing.
The project will initially deploy 1,000 sets of equipment. At full capacity, annual output is expected to reach 1 million hours of real-world scenario data.
In fact, beyond China Mobile, JD.com has also launched a similar crowdsourcing data collection project, aiming to solve the industry's "data drought" at the source.
In March, JD.com announced plans to build the world's largest and most comprehensive embodied intelligence data collection center. To do this, it will mobilize hundreds of thousands of people—including over 100,000 internal employees across various professions and 500,000 external industry professionals. In Suqian alone, more than 100,000 citizens will be recruited, covering over 100 specific scenarios ranging from homes, offices, and factories to logistics, stores, restaurants, healthcare, and sanitation.
The plan targets the accumulation of 5 million hours of real-world human video data within one year, breaking 10 million hours in two years, while simultaneously collecting 1 million hours of robot body data.

Image Source: Suqian Release
In mid-April, JD.com officially released its embodied intelligence data infrastructure, covering the full chain of "collection, storage, labeling, training, evaluation, simulation, and testing." The aim is to bridge the full closed loop from data acquisition to model testing.
At the same time, JD Cloud released its self-developed wearable ultra-high-definition collection terminal, the JoyEgoCam, and launched an embodied intelligence data trading platform. The initial batch includes 2,000 hours of high-precision annotated datasets.
Currently, it is reported that the JD.com Suqian center alone can accommodate nearly 10,000 people working simultaneously. JD.com is also extending data collection scenarios deep into community livelihoods and industrial production frontlines, building a "community crowdsourcing + industrial fixed-point" dual-track collection model.
The true value of this data collection model can be viewed on multiple levels.
First, it drives down the cost of data collection.
By distributing tasks to the general public and leveraging real-life or work settings with lightweight equipment, crowdsourcing can significantly reduce costs while boosting efficiency. In contrast, teleoperation data is extremely expensive, often costing hundreds of dollars per hour.
Second, it offers a diversity of scenarios that laboratories cannot match.
Ten thousand collectors mean ten thousand room layouts, ten thousand lighting conditions, and ten thousand operational habits. This "wild" data from the real world, though less orderly, is precisely the "nutrient" needed to train a model's generalization capabilities.
After all, the world robots will face is inherently chaotic and diverse; we cannot expect them to learn everything in sterile labs.
Third, it opens a path from public participation to a commercial closed loop.
Collectors get paid, tech giants gain traffic and added value, and the embodied intelligence industry secures scarce data. This win-win structure transforms data collection from a purely capital-intensive "money-burning project" into a self-sustaining, rolling development.
According to data released by JD.com in April, the daily processing volume of its data collection project had already reached hundreds of thousands of entries, with a data validity rate of 95% and an overall processing cost reduction of 60%.
Crowdsourcing Is Not a "Cure-All", A Single Model Cannot Quench the Data Thirst
Facing the embodied data gap, crowdsourcing can solve the problem of "volume," but not everything.
First, there is the issue of data quality. Operational habits and skill levels vary among collectors. Even with unified training, ensuring consistency is difficult. Moreover, the industry still lacks unified data formats and quality standards. Data collected by different companies is like people speaking different dialects—difficult to exchange and integrate.
Second, there is a ceiling on precision. Human physiological limitations—such as micro-tremors, fatigue, and reaction delays—determine that many fine operations cannot be captured via crowdsourcing.
Furthermore, the Crowdsourcing tactic has its limits. When industry demand grows from "millions of hours" to "billions of hours," the number of collectors cannot expand indefinitely, and labor costs will inevitably rise.
In the long run, once robots truly enter homes and factories at scale, they will become the biggest source of data themselves—but that requires crossing the initial threshold first.

Image Source: Daimon Robotics
Because of these limitations, a consensus is forming in the industry: the future data solution will not be a "solo act" by any single route. Instead, multiple collection methods will coexist, forming a clear "data pyramid" architecture. Moving from bottom to top, data volume decreases, quality increases, and so do the difficulty and cost of collection.
At the bottom is internet video data and simulated synthetic data. It is relatively easy to acquire and massive in scale, but it contains a lot of invalid information, especially a lack of real physical contact information.
The industry view is that such data can be used heavily during the pre-training phase as a starting point for model training.
The layer above is "human-centric" real-world collection data, including crowdsourcing. This involves using lightweight equipment like handheld grippers to scale up the collection of operation sequences with real tactile feedback.
This model plays an important "bridge" role: the data comes from the real world with diverse scenarios and controllable costs, but the trade-off is high quality control costs and difficulty in capturing high-precision movements.
At the top, with the smallest volume, is real-machine data. It has the highest precision and is closest to the actual running state of robots, but collection efficiency is extremely low and costs are high. It is hard to scale and is better suited for tackling core tasks.
These technical routes are not about replacing one another, but about division of labor and synergy. Simulation provides scale, crowdsourcing provides reality, and teleoperation provides precision. Together, they push the model from "being able to see and recognize" to "being able to do and act."
Returning to the beginning, what does the collaboration between Daimon and China Mobile actually signify?
It is not the ultimate solution, but it opens a new door for the industry.
When citizens in Chenzhou don their collection gloves to "feed" data to robots during their daily chores, a nationwide "embodied intelligence infrastructure movement" involving public participation has quietly begun.









