Feel the Force

Abstract

Robots often struggle with fine-grained force control in contact-rich manipulation tasks. While learning from human demonstrations offers a scalable solution, visual observations alone lack the fidelity needed to capture tactile intent. To bridge this gap, we propose Feel the Force (FTF): a framework that learns force-sensitive manipulation from human tactile demonstrations.

FTF uses a low-cost tactile glove to measure contact forces and vision-based hand pose estimation to capture human demonstrations. These are used to train a closed-loop transformer policy that predicts robot end-effector trajectories and desired contact forces. At deployment, a PD controller modulates gripper closure to match the predicted forces, enabling precise and adaptive manipulation.

FTF generalizes across diverse force-sensitive tasks, achieving a 77% success rate across five manipulation scenarios, and demonstrates robustness to test-time disturbances—highlighting the benefits of grounding robotic control in human tactile behavior.

Overview

FEELTHEFORCE (FTF) is a novel learning framework that enables robots to perform precise force-sensitive manipulation by learning from natural human interactions. Unlike previous methods that rely on teleoperation or visual-only demonstrations, FTF uses a tactile glove to capture human contact forces and trains a closed-loop policy that predicts both hand trajectories and desired contact forces.

At deployment, this policy is retargeted to a real robot (Franka Panda) using a shared representation of hand and robot keypoints, and executed with a PD controller that modulates the gripper to match predicted forces in real time. This enables zero-shot transfer from human to robot, allowing the robot to handle tasks like gently placing an egg or twisting a bottle cap with fine force control — without any robot data during training.

Method

1. Tactile Data Collection

We design a custom, low-cost tactile glove inspired by AnySkin to capture 3D contact forces during natural human manipulation. The glove streams high-frequency force data, synchronized with stereo camera views capturing hand and object interactions. A tactile sensor is mounted on the robot's gripper to replicate the force sensing.

More details

The tactile glove places magnetometer-based sensors on the underside of the thumb to minimize occlusion. The sensor data is sampled at 200Hz and downsampled to align with 30Hz visual frames. The resulting force readings are used to supervise a policy trained entirely on human demonstrations, without requiring robot data.

2. Human-to-Robot Embodiment Transfer

We unify human and robot action spaces using a keypoint-based retargeting scheme. Human hand poses are extracted via triangulated keypoints from dual camera views and mapped onto the robot's end-effector pose.

More details

We compute robot position as the midpoint of thumb and index fingertip keypoints, and infer orientation via rigid-body transform between initial and current hand frames. The resulting pose is projected to a set of robot keypoints for policy learning, making the method embodiment-agnostic and transferable.

3. Policy Learning

A Transformer-based architecture learns from historical trajectories of robot and object keypoints, along with force and gripper state inputs, to predict future motion and desired contact forces.

More details

Each input point track is encoded through an MLP and fed into the Transformer as a token. The model is trained via mean squared error on predicted point tracks and force values, leveraging temporal smoothness via action chunking and exponential averaging.

4. PD Force Controller

At deployment, a PD controller modulates the robot’s gripper to track the predicted contact force in real-time, ensuring robust and precise execution even under morphology or sensing discrepancies.

More details

The controller adjusts the gripper closure iteratively until the measured force matches the policy's predicted value. This forms a stable outer-loop around the robot hardware, correcting for noise and delay in actuation.

Experiments

Human Demonstration

Robot Manipulation (x4 speed)

Human Demonstration

Robot Manipulation (x4 speed)

Human Demonstration

Robot Manipulation (x10 speed)

Human Demonstration

Robot Manipulation (x4 speed)

Human Demonstration

Robot Manipulation (x4 speed)

2. Comparison to Baselines

TABLE I: Performance comparison of different gripper action spaces in Human Demo

	FTF	Binary Gripper	Continuous Gripper
Place bread on plate	13/15	0/15	0/15
Unstack cup	9/15	0/15	0/15 (2 picked 3 cups)
Place egg in pot	13/15	0/15	0/15
Place chips	10/15	0/15	0/15
Twist bottle cap	13/15	11/15 (1/15 break gripper pads)	0/15

TABLE II: Performance comparison of different gripper action spaces in Robot Teleop Demo

	FTF	Binary Gripper	Continuous Gripper
Place bread on plate	5/15	0/15	3/15
Unstack cup	4/15	0/15 (6 picked 3 cups)	0/15 (2 picked 2 cups)
Place egg in pot	0/15	0/15	0/15
Place chips	3/15	0/15	0/15
Twist bottle cap	9/15	12/15	8/15

3. Failure Case Analysis of Baseline Policies (all demos are x4 speed)

Overforce-Induced Object Damage (Binary Gripper)

Binary gripper policies often result in crushed objects during manipulation.

Grasp Instability from Misaligned Distance Mapping (Continuous Gripper)

Continuous gripper policies incorrectly map human hand distances to robot gripper apertures, resulting in unstable grasps or missed pickups.

BibTeX

@misc{adeniji2025feelforcecontactdrivenlearning,
      title={Feel the Force: Contact-Driven Learning from Humans}, 
      author={Ademi Adeniji and Zhuoran Chen and Vincent Liu and Venkatesh Pattabiraman and Raunaq Bhirangi and Siddhant Haldar and Pieter Abbeel and Lerrel Pinto},
      year={2025},
      eprint={2506.01944},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2506.01944}, 
}

Feel The Force:
Contact-Driven Learning from Humans

We present Feel the Force (FTF), a novel framework for learning force-sensitive manipulation from natural human demonstrations. FTF uses a low-cost tactile glove and third-person cameras to collect force-trajectory data, enabling zero-shot policy transfer without any robot data during training.

Abstract

Overview