CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data

Overview

Manipulation is a crucial skill for fully autonomous robots operating in complex, real-world environments. As robots move into dynamic, human-centric spaces, it is increasingly important to develop reliable and versatile manipulation abilities. With the availability of large datasets (e.g., RT-X) and recent advances in robot learning and perception (e.g., deep RL, diffusion, and language-conditioned methods), there has been significant progress in acquiring new skills, understanding common sense, and enabling natural interaction in human-centric environments. These advances spark new questions about (i) the learning methods that best utilize abundant data to learn versatile and reliable manipulation policies and (ii) the modalities (e.g., visual, tactile) and sources (e.g., real-world, high-fidelity contact simulations) of training data for acquiring general-purpose skills. In this workshop, we aim to facilitate an interdisciplinary exchange between the communities in robot learning, computer vision, manipulation, and control. Our goal is to map out further potential and limitations of current large-scale data-driven methods for the community and discuss pressing challenges and opportunities in diversifying data modalities and sources for mastering robot manipulation in real-world applications.

Discussion Themes

Our workshop comprises two closely related themes with invited talks from experts in each.

Theme A: Learning Methods for Versatile and Reliable Manipulation

– What are the roles of RL, imitation learning, and foundation models in manipulation, and how do we best leverage these methods/tools to achieve human-like learning and refinement of manipulation skills?
– Is scaling with large models and diverse datasets the way toward acquiring general-purpose manipulation skills? How do we best exploit our prior knowledge to facilitate versatile but also reliable learning? What are some challenges arising from cross-embodiment learning?
– How can foundation models trained on large datasets reach high reliability (99.9+%) as required in many real-world (industrial) applications? What are some criteria for real-world deployment?
– Will the common sense/reasoning capability enabled by foundation models improve the robustness of robot learning algorithms in the long run?

Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition

– We have seen a proliferation of LLMs and VLMs in the robot decision-making software stack. Which sensor data modalities are required for learning and reliable deployment of manipulation skills?
– When is tactile feedback required for manipulation, and how can it be combined with vision? Can we train gripper-agnostic foundation models for dexterous manipulation?
– What role does internet video data play, and is simulation necessary to generate synthetic data? How can we collect informative data in the real world and effectively combine it with synthetic data for “in-the-wild” task learning?
– How can manipulation datasets containing different data modalities be effectively combined for cross-embodiment learning?

Confirmed Speakers

Sergey Levine, University of California Berkeley
Nathan Ratliff, NVIDIA
Danica Kragic, KTH
Ted Xiao, Google DeepMind
Shuran Song, Stanford University
Katerina Fragkiadaki, CMU
Mohsen Kaboli, BMW and TU/e
Carlo Sferrazza, University of California Berkeley
Youngwoon Lee, Yonsei University

Program

Below is a tentative program for the workshop. Times are in CEST. Session A and B each have a 10-minute introduction, a set of 20-minute invited talks, and a 40-minute moderated panel discussion. In between, we have a spotlight talks session and a poster session.

Theme A: Learning Methods for Versatile and Reliable Manipulation

09:00 – 09:10: Opening Remarks
09:10 – 09:20: Theme A Introduction
09:20 – 10:50: Theme A Invited Talks
10:50 – 11:10: Coffee Break
11:10 – 11:50: Theme A Panel Discussion

Spotlights and Poster Sessions

11:50 – 12:20: Spotlight Talks
12:20 – 13:50: Lunch Break and Poster Session

Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition

13:50 – 14:00: Theme B Introduction
14:00 – 15:50: Theme B Invited Talks
15:50 – 16:10: Coffee Break
16:10 – 16:50: Theme B Panel Discussion
16:50 – 17:00: Concluding Remarks

Organizers

Angela Schoellig, Technical University of Munich and University of Toronto
Animesh Garg, Georgia Institute of Technology and NVIDIA
Karime Pereida, Kindred
Oier Mees, University of California Berkeley
Ralf Römer, Technical University of Munich
Martin Schuck, Technical University of Munich
Siqi Zhou, Technical University of Munich

University of Toronto Institute for Aerospace Studies