Focus-Then-Contact | Real-World RL for Contact-Rich Manipulation

Welcome to FTC

Abstract

Real-world reinforcement learning has shown strong potential for robotic manipulation, but contact-rich tasks still require substantial human-in-the-loop effort, especially under visual background changes and positional disturbances. We propose Focus-Then-Contact (FTC), a lightweight and low-cost framework that accelerates human-in-the-loop real-world RL for contact-rich manipulation. FTC combines a stable imitation-learning base policy with residual RL, introduces a keyframe-based affordance-guided dense reward to focus exploration on task-relevant contact regions, and optimizes the human intervention mechanism to avoid conflicts with online RL control. Across six real-world contact-rich tasks, FTC improves success rates, accelerates convergence, and enables more robust learning under real-world disturbances.

Method

A lightweight path from coarse focus to precise contact.

Real-world RL can master contact-rich manipulation, but physical interaction is expensive and sparse rewards make exploration inefficient. FTC reduces meaningless trial-and-error by combining an IL base policy, residual RL, keyframe-based affordance guidance, and human intervention designed for cluttered scenes.

01

Base action from residual RL

A frozen imitation-learning policy provides a stable action prior, while a lightweight residual policy learns real-time corrections.

02

Keyframe-based affordance reward

Goal keyframes from demonstrations provide dense visual guidance, helping the robot focus on regions that matter for task completion.

03

Human-in-the-loop contact refinement

Expert interventions are integrated through a timed intervention window to avoid conflict with RL while preserving safety.

How can we efficiently learn robust and precise control policies within extremely limited physical interactions?

Real-World Demos

Contact-rich atomic skills learned on real robots.

Hanging keychain and opening door

Hanging keychain

Opening door

Charger grasp-insertion and toy insertion

Charger grasp-insertion

Toy insertion

USB insertion

USB insertion: easy

USB insertion: hard

Training Process

Fast convergence with visible real-world improvement.

The original FTC demo page records complete training processes: USB hard insertion converges in about 45 minutes, while keychain hanging takes about 20 minutes.

USB hard training

Keychain training

Robustness & OOD Generalization

From fixed training positions to unseen execution poses.

The USB OOD case shows generalization from four trained positions to eight test positions.

USB OOD case 1

USB OOD case 2