Overview
Manipulation is a crucial skill for fully autonomous robots operating in complex, real-world environments. As robots move into dynamic, human-centric spaces, it is increasingly important to develop reliable and versatile manipulation abilities. With the availability of large datasets (e.g., RT-X) and recent advances in robot learning and perception (e.g., deep RL, diffusion, and language-conditioned methods), there has been significant progress in acquiring new skills, understanding common sense, and enabling natural interaction in human-centric environments. These advances spark new questions about (i) the learning methods that best utilize abundant data to learn versatile and reliable manipulation policies and (ii) the modalities (e.g., visual, tactile) and sources (e.g., real-world, high-fidelity contact simulations) of training data for acquiring general-purpose skills. In this workshop, we aim to facilitate an interdisciplinary exchange between the communities in robot learning, computer vision, manipulation, and control. Our goal is to map out further potential and limitations of current large-scale data-driven methods for the community and discuss pressing challenges and opportunities in diversifying data modalities and sources for mastering robot manipulation in real-world applications.
Discussion Topics | Call for Papers | Program | Invited Speakers | Organizers
Discussion Themes
Our workshop comprises two closely related themes with invited talks from experts in each.
Theme A: Learning Methods for Versatile and Reliable Manipulation
– What are the roles of RL, imitation learning, and foundation models in manipulation, and how do we best leverage these methods/tools to achieve human-like learning and refinement of manipulation skills?
– Is scaling with large models and diverse datasets the way toward acquiring general-purpose manipulation skills? How do we best exploit our prior knowledge to facilitate versatile but also reliable learning? What are some challenges arising from cross-embodiment learning?
– How can foundation models trained on large datasets reach high reliability (99.9+%) as required in many real-world (industrial) applications? What are some criteria for real-world deployment?
– Will the common sense/reasoning capability enabled by foundation models improve the robustness of robot learning algorithms in the long run?
Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition
– We have seen a proliferation of LLMs and VLMs in the robot decision-making software stack. Which sensor data modalities are required for learning and reliable deployment of manipulation skills?
– When is tactile feedback required for manipulation, and how can it be combined with vision? Can we train gripper-agnostic foundation models for dexterous manipulation?
– What role does internet video data play, and is simulation necessary to generate synthetic data? How can we collect informative data in the real world and effectively combine it with synthetic data for “in-the-wild” task learning?
– How can manipulation datasets containing different data modalities be effectively combined for cross-embodiment learning?
Call for Papers
We are inviting researchers from different disciplines to share novel ideas on topics pertinent to the workshop themes, which include but are not limited to:
- Foundation models for robot learning
- Diffusion and energy-based policies for robot manipulation
- Deep reinforcement learning for real-world robot grasping and manipulation
- Real-world datasets and simulators for general-purpose skill acquisition
- Comparisons of foundation-model-based methods and conventional robot learning methods (e.g., task generalization versus performance)
- Visuo-tactile sensing for robot manipulation and/or methods leveraging multimodalities
- Environment perception and representation for robot learning
- Positions on what robots are not yet able to do (i.e., the challenges at the cutting edge of one or multiple subfields)
- Best practices for data collection and aggregation (multimodality, teleoperation, examples to include)
The review process will be double-blind. Accepted papers will be published on the workshop webpage and will be presented as a spotlight talk or as a poster. If you have any questions, please contact us at contact.lsy@xcit.tum.de.
Paper Format
Suggested Length: minimum 2 and maximum 4 pages excluding references
Style Template: CoRL Paper Template
Important Dates
Initial Submission: October 15, 2024 (11:59 pm AoE)
Author Notification: October 29, 2024
Camera Ready Submission: November 01, 2024 (11:59 pm AoE)
Workshop Date: November 09, 2024
OpenReview Submission Link
http://tiny.cc/corl24-mrm-d-submission
Invited Speakers
Sergey Levine, University of California Berkeley
Nathan Ratliff, NVIDIA
Danica Kragic, KTH
Ted Xiao, Google DeepMind
Shuran Song, Stanford University
Katerina Fragkiadaki, CMU
Mohsen Kaboli, BMW and TU/e
Carlo Sferrazza, University of California Berkeley
Youngwoon Lee, Yonsei University
Program
Below is a tentative program for the workshop. Times are in CEST. Session A and B each have a 10-minute introduction, a set of 20-minute invited talks, and a 40-minute moderated panel discussion. In between, we have a spotlight talks session and a poster session.
Theme A: Learning Methods for Versatile and Reliable Manipulation
09:00 – 09:10: Opening Remarks
09:10 – 09:20: Theme A Introduction
09:20 – 10:50: Theme A Invited Talks
10:50 – 11:10: Coffee Break
11:10 – 11:50: Theme A Panel Discussion
Spotlights and Poster Sessions
11:50 – 12:20: Spotlight Talks
12:20 – 13:50: Lunch Break and Poster Session
Theme B: Data Collection and Sensor Modalities for General-Purpose Skill Acquisition
13:50 – 14:00: Theme B Introduction
14:00 – 15:50: Theme B Invited Talks
15:50 – 16:10: Coffee Break
16:10 – 16:50: Theme B Panel Discussion
16:50 – 17:00: Concluding Remarks
Organizers
Angela Schoellig, Technical University of Munich and University of Toronto
Animesh Garg, Georgia Institute of Technology and NVIDIA
Karime Pereida, Kindred
Oier Mees, University of California Berkeley
Ralf Römer, Technical University of Munich
Martin Schuck, Technical University of Munich
Siqi Zhou, Technical University of Munich