Real-World Reinforcement Learning · Contact-Rich Manipulation

Focus-Then-Contact

Speeding Up Robotic Contact-Rich Task Learning with Affordance-Guided Real-World Residual Reinforcement Learning

Guanren Qiao1, Ruixiang Ouyang1, Sheng Xu1, Ruixing Jin1, Yueci Deng2, Yunxin Tai2, Kui Jia1,2, Guiliang Liu 1,†
The Chinese University of Hong Kong, Shenzhen
DexForce

1School of Data Science, The Chinese University of Hong Kong, Shenzhen   ·   2DexForce Co., Ltd.   ·   Corresponding author

▼ Scroll down

Welcome to FTC

Abstract

Real-world reinforcement learning has shown strong potential for robotic manipulation, but contact-rich tasks still require substantial human-in-the-loop effort, especially under visual background changes and positional disturbances. We propose Focus-Then-Contact (FTC), a lightweight and low-cost framework that accelerates human-in-the-loop real-world RL for contact-rich manipulation. FTC combines a stable imitation-learning base policy with residual RL, introduces a keyframe-based affordance-guided dense reward to focus exploration on task-relevant contact regions, and optimizes the human intervention mechanism to avoid conflicts with online RL control. Across six real-world contact-rich tasks, FTC improves success rates, accelerates convergence, and enables more robust learning under real-world disturbances.

Method

A lightweight path from coarse focus to precise contact.

Real-world RL can master contact-rich manipulation, but physical interaction is expensive and sparse rewards make exploration inefficient. FTC reduces meaningless trial-and-error by combining an IL base policy, residual RL, keyframe-based affordance guidance, and human intervention designed for cluttered scenes.

01

Base action from residual RL

A frozen imitation-learning policy provides a stable action prior, while a lightweight residual policy learns real-time corrections.

02

Keyframe-based affordance reward

Goal keyframes from demonstrations provide dense visual guidance, helping the robot focus on regions that matter for task completion.

03

Human-in-the-loop contact refinement

Expert interventions are integrated through a timed intervention window to avoid conflict with RL while preserving safety.

How can we efficiently learn robust and precise control policies within extremely limited physical interactions?

Focus-Then-Contact framework overview

Real-World Demos

Contact-rich atomic skills learned on real robots.

Hanging keychain and opening door

Hanging keychain

Opening door

Charger grasp-insertion and toy insertion

Charger grasp-insertion

Toy insertion

USB insertion

USB insertion: easy

USB insertion: hard

Training Process

Fast convergence with visible real-world improvement.

The original FTC demo page records complete training processes: USB hard insertion converges in about 45 minutes, while keychain hanging takes about 20 minutes.

USB hard training

Keychain training

Robustness & OOD Generalization

From fixed training positions to unseen execution poses.

The USB OOD case shows generalization from four trained positions to eight test positions.

USB OOD case 1

USB OOD case 2