DexScale: Automating Data Scaling for Sim2Real Generalizable Robot Skills

Abstract

A critical prerequisite for achieving generalizable robot control is the availability of a large-scale robot training dataset. Due to the expense of collecting realistic robotic data, recent studies explored simulating and recording robot skills in vir- tual environments. While simulated data can be generated at higher speeds, lower costs, and larger scales, the applicability of such simulated data remains questionable due to the gap of between simulated and realistic environments.

To advance the Sim2Real generalization, in this study, we present DexScale, a data engine designed to perform automatic skills simulation and scaling for learning deployable robot manipulation policies. Specifically, DexScale ensures the usability of simulated skills by integrating diverse forms of realistic data into the simulated environment, preserving semantic alignment with the target tasks.

For each simulated skill in the environment, DexScale facilitates effective Sim2Real data scaling by automating the process of domain randomization and adaptation. Tuned by the scaled dataset, the control policy achieves zero-shot Sim2Real generalization across diverse tasks, multiple robot embodiments, and widely studied policy model architectures, highlighting its importance in ad- vancing Sim2Real embodied intelligence

Data Scaling Pipeline

As a data engine, DexScale takes task-descriptive data as input and generates a skill dataset to support Sim2Real transfer. This enables the zero-shot deployment of robot policies in realistic environments.

Examples of Data Scaling

Figures:

Data Scaling for Object Grasping

Data Scaling for Object Manipulation & Object Re-arrangement

Videos:

Data Scaling for Object Grasping

Data Scaling for Object Manipulation

Data Scaling for Object Re-arrangement

Data Scaling for Object Grasp-then-Manipulation

Figures and Videos above illustrate examples of action trajectories for the tasks of object grasping, box manipulation, and table rearrangement. To demonstrate the scalability of DexScale, the control policies are deployed on different robots, including two single-arm robots and a dual-arm robot equipped with wrist-mounted cameras.

Experiment Results

1. Success rates of imitation policies learned by different datasets under various Sim2Real gaps. For the first eight domain gaps, we employ the transformer-based policy Zhao2023ACT to tackle grasping tasks. For the last two domain gaps, we use the diffusion-based policy Chi2023DiffusionPolicy to address the open-box task.

2. Robot control performance for different tasks under both realistic (upper) and simulated (lower) environments. The Re-Orientation task is tackeled by a simulation data trained diffusion-based Vision-Language-Action (VLA) model Liu2024RDT