Base action from residual RL
A frozen imitation-learning policy provides a stable action prior, while a lightweight residual policy learns real-time corrections.
Real-World Reinforcement Learning · Contact-Rich Manipulation
1School of Data Science, The Chinese University of Hong Kong, Shenzhen · 2DexForce Co., Ltd. · †Corresponding author
Real-world reinforcement learning has shown strong potential for robotic manipulation, but contact-rich tasks still require substantial human-in-the-loop effort, especially under visual background changes and positional disturbances. We propose Focus-Then-Contact (FTC), a lightweight and low-cost framework that accelerates human-in-the-loop real-world RL for contact-rich manipulation. FTC combines a stable imitation-learning base policy with residual RL, introduces a keyframe-based affordance-guided dense reward to focus exploration on task-relevant contact regions, and optimizes the human intervention mechanism to avoid conflicts with online RL control. Across six real-world contact-rich tasks, FTC improves success rates, accelerates convergence, and enables more robust learning under real-world disturbances.
Method
Real-world RL can master contact-rich manipulation, but physical interaction is expensive and sparse rewards make exploration inefficient. FTC reduces meaningless trial-and-error by combining an IL base policy, residual RL, keyframe-based affordance guidance, and human intervention designed for cluttered scenes.
A frozen imitation-learning policy provides a stable action prior, while a lightweight residual policy learns real-time corrections.
Goal keyframes from demonstrations provide dense visual guidance, helping the robot focus on regions that matter for task completion.
Expert interventions are integrated through a timed intervention window to avoid conflict with RL while preserving safety.
Real-World Demos
Hanging keychain
Opening door
Charger grasp-insertion
Toy insertion
USB insertion: easy
USB insertion: hard
Training Process
The original FTC demo page records complete training processes: USB hard insertion converges in about 45 minutes, while keychain hanging takes about 20 minutes.
USB hard training
Keychain training
Robustness & OOD Generalization
The USB OOD case shows generalization from four trained positions to eight test positions.
USB OOD case 1
USB OOD case 2