A critical prerequisite for achieving generalizable robot control is the availability of a large-scale robot training dataset. Due to the expense of collecting realistic robotic data, recent studies explored simulating and recording robot skills in vir- tual environments. While simulated data can be generated at higher speeds, lower costs, and larger scales, the applicability of such simulated data remains questionable due to the gap of between simulated and realistic environments.
To advance the Sim2Real generalization, in this study, we present DexScale, a data engine designed to perform automatic skills simulation and scaling for learning deployable robot manipulation policies. Specifically, DexScale ensures the usability of simulated skills by integrating diverse forms of realistic data into the simulated environment, preserving semantic alignment with the target tasks.
For each simulated skill in the environment, DexScale facilitates effective Sim2Real data scaling by automating the process of domain randomization and adaptation. Tuned by the scaled dataset, the control policy achieves zero-shot Sim2Real generalization across diverse tasks, multiple robot embodiments, and widely studied policy model architectures, highlighting its importance in ad- vancing Sim2Real embodied intelligence
Data Scaling for Object Grasping
Data Scaling for Object Manipulation & Object Re-arrangement
Data Scaling for Object Grasping
Data Scaling for Object Manipulation
Data Scaling for Object Re-arrangement
Data Scaling for Object Grasp-then-Manipulation
1. Success rates of imitation policies learned by different datasets under various Sim2Real gaps. For the first eight domain gaps, we employ the transformer-based policy Zhao2023ACT to tackle grasping tasks. For the last two domain gaps, we use the diffusion-based policy Chi2023DiffusionPolicy to address the open-box task.
2. Robot control performance for different tasks under both realistic (upper) and simulated (lower) environments. The Re-Orientation task is tackeled by a simulation data trained diffusion-based Vision-Language-Action (VLA) model Liu2024RDT
3. Real-world deployment videos of policy trained on datasets of different tasks from DexScale.
Object Grasp
Object Manipulation
Object Rearrangement
Object Grasp-then-Manipulation
@inproceedings{ liu2025dexscale, title={DexScale: Automating Data Scaling for Sim2Real Generalizable Robot Control}, author={Guiliang Liu and Yueci Deng and Runyi Zhao and Huayi Zhou and Jian Chen and Jietao Chen and Ruiyan Xu and Yunxin Tai and Kui Jia}, booktitle={International Conference on Machine Learning, ICML}, year={2025}, }