CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data

Overview

Manipulation is a crucial skill for fully autonomous robots operating in complex, real-world environments. As robots move into dynamic, human-centric spaces, it is increasingly important to develop reliable and versatile manipulation abilities. With the availability of large datasets (e.g., RT-X) and recent advances in robot learning and perception (e.g., deep RL, diffusion, and language-conditioned methods), there has been significant progress in acquiring new skills, understanding common sense, and enabling natural interaction in human-centric environments. These advances spark new questions about (i) the learning methods that best utilize abundant data to learn versatile and reliable manipulation policies and (ii) the modalities (e.g., visual, tactile) and sources (e.g., real-world, high-fidelity contact simulations) of training data for acquiring general-purpose skills. In this workshop, we aim to facilitate an interdisciplinary exchange between the communities in robot learning, computer vision, manipulation, and control. Our goal is to map out further potential and limitations of current large-scale data-driven methods for the community and discuss pressing challenges and opportunities in diversifying data modalities and sources for mastering robot manipulation in real-world applications.

Discussion Topics | Call for Papers | Program | Invited Speakers | Organizers

Discussion Themes


Our workshop comprises two closely related themes with invited talks from experts in each.

Theme A: Learning Methods for Versatile and Reliable Manipulation

– What are the roles of RL, imitation learning, and foundation models in manipulation, and how do we best leverage these methods/tools to achieve human-like learning and refinement of manipulation skills?
– Is scaling with large models and diverse datasets the way toward acquiring general-purpose manipulation skills? How do we best exploit our prior knowledge to facilitate versatile but also reliable learning? What are some challenges arising from cross-embodiment learning?
– How can foundation models trained on large datasets reach high reliability (99.9+%) as required in many real-world (industrial) applications? What are some criteria for real-world deployment?
– Will the common sense/reasoning capability enabled by foundation models improve the robustness of robot learning algorithms in the long run?

Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition

– We have seen a proliferation of LLMs and VLMs in the robot decision-making software stack. Which sensor data modalities are required for learning and reliable deployment of manipulation skills?
– When is tactile feedback required for manipulation, and how can it be combined with vision? Can we train gripper-agnostic foundation models for dexterous manipulation?
– What role does internet video data play, and is simulation necessary to generate synthetic data? How can we collect informative data in the real world and effectively combine it with synthetic data for “in-the-wild” task learning?
– How can manipulation datasets containing different data modalities be effectively combined for cross-embodiment learning?

Call for Papers


We are inviting researchers from different disciplines to share novel ideas on topics pertinent to the workshop themes, which include but are not limited to:

  • Foundation models for robot learning
  • Diffusion and energy-based policies for robot manipulation
  • Deep reinforcement learning for real-world robot grasping and manipulation
  • Real-world datasets and simulators for general-purpose skill acquisition
  • Comparisons of foundation-model-based methods and conventional robot learning methods (e.g., task generalization versus performance)
  • Visuo-tactile sensing for robot manipulation and/or methods leveraging multimodalities
  • Environment perception and representation for robot learning
  • Positions on what robots are not yet able to do (i.e., the challenges at the cutting edge of one or multiple subfields)
  • Best practices for data collection and aggregation (multimodality, teleoperation, examples to include)

The review process will be double-blind. Accepted papers will be published on the workshop webpage and will be presented as a spotlight talk or as a poster. If you have any questions, please contact us at contact.lsy@xcit.tum.de.

Paper Format

Suggested Length: minimum 2 and maximum 4 pages excluding references
Style Template: CoRL Paper Template

Important Dates

Initial Submission: October 15, 2024 (11:59 pm AoE)
Author Notification: October 29, 2024
Camera Ready Submission: November 01, 2024 (11:59 pm AoE)
Workshop Date: November 09, 2024

OpenReview Submission Link

http://tiny.cc/corl24-mrm-d-submission

Invited Speakers


Sergey Levine
UC Berkeley

Ankur Handa
NVIDIA

Ted Xiao
Google DeepMind

Shuran Song
Stanford University

Mohsen Kaboli
BMW and TU/e

Carlo Sferrazza
UC Berkeley

Youngwoon Lee
Yonsei University

Program

Below is a tentative program for the workshop. Times are in CEST. Session A and B each have a 10-minute introduction, a set of 20-minute invited talks, and a 30-minute moderated panel discussion. In between, we have a spotlight talks session.

Theme A: Learning Methods for Versatile and Reliable Manipulation

08:45 – 09:00: Opening Remarks and Theme A Introduction
09:00 – 10:30: Theme A Invited Talks
10:30 – 11:00: Coffee Break
11:00 – 11:30: Theme A Invited Talks
11:30 – 12:00: Theme A Panel Discussion

Spotlights and Poster Sessions

12:00 – 13:45: Lunch Break and Poster Session
13:45 – 13:55: Theme B Introduction
13:55 – 14:30: Spotlight Talks

Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition

14:30 – 15:30: Theme B Invited Talks
15:30 – 16:00: Coffee Break
16:00 – 17:30: Theme B Invited Talks
17:30 – 18:00: Theme B Panel Discussion

Organizers

Angela Schoellig, Technical University of Munich and University of Toronto
Animesh Garg, Georgia Institute of Technology and NVIDIA
Karime Pereida, Kindred
Oier Mees, University of California Berkeley
Ralf Römer, Technical University of Munich
Martin Schuck, Technical University of Munich
Siqi Zhou, Technical University of Munich

University of Toronto Institute for Aerospace Studies