ICRA 2026 Workshop on Semantics for Reliable Robot Autonomy: From Environment Understanding and Reasoning to Safe Interaction

Overview

Enabled by increasingly accessible robot hardware, robotic applications are undergoing a shift from specialized tasks in well-defined settings toward broader, general-purpose autonomy. In such domains, robots must not only perceive and localize within their environment but also interpret it semantically. This includes recognizing what is in the world as well as understanding the relevant properties of objects and reasoning about the implications of their actions (e.g., a warehouse robot must identify fragile items on the floor and infer that running over them could cause damage or create a safety hazard). Recent progress in large-scale models and language-conditioned learning opens new opportunities for robots to contextualize their actions, grounding them in common sense and semantic knowledge. Yet, leveraging these advances requires balancing prior knowledge and data-driven flexibility, designing appropriate environment representations to facilitate downstream tasks, and ensuring that semantic reasoning is reliably translated into safe behaviour. This workshop aims to foster interdisciplinary discussions across robot learning, perception, estimation, and control, while also drawing inspiration from linguistics and cognitive science. By bringing together these communities, the workshop will (i) explore methods that harness semantic understanding to advance reliable robot autonomy, and (ii) identify the opportunities and pressing challenges for real-world applications.

Discussion Themes | Program | Call for Papers | Invited Speakers | Organizers

Discussion Themes

Reliable autonomy in robots requires not only perceiving objects, but understanding relationships, context, and temporal dynamics to act safely in complex, real-world environments. With advances in large-scale datasets, multimodal sensing (vision, touch, proprioception), and language-based reasoning, there is growing potential to enable robots to reason about their surroundings and adaptively plan safe behaviours. These developments raise important questions about (i) how best to represent and reason about environments and knowledge, including semantic hierarchies and temporal context, and (ii) how to ensure safe interactions, including the role of priors, sensor modalities, and uncertainty. Addressing these challenges requires a multidisciplinary effort that integrates perspectives from perception, language, learning, cognitive science, and control. This workshop is designed to foster interdisciplinary discussions by bringing together experts from academia and industry to share insights across two interconnected themes, which are respectively described below.

Theme A. Environment Perception and Contextual Reasoning

– What are the strengths and limitations of different representations of the robot operating environment (e.g., dense metric maps, scene graphs)? What properties are necessary in these representations to facilitate downstream planning and control tasks (e.g., differentiability and uncertainty quantification)?
– What role should language play in robotics for knowledge representation and generalizable skill acquisition with prioritized safety and reliability? What is the “right” balance between more modular approaches versus end-to-end methods?
– How can we handle semantic hierarchies that may span arbitrarily many levels? For instance, is it necessary to represent articulation and sub-parts, and where should we draw the boundary?
– How do we effectively leverage high-dimensional sensor data (camera, lidar, radar) with foundation models for contextual reasoning in robotics? How do we characterize and account for uncertainties from perception, especially in dynamic or changing scenes?
– Many approaches focus on answering “what is” rather than “what was” or “what will be.” How do we best incorporate temporal aspects into understanding and reasoning? Do advances in areas such as video understanding or generation translate directly to this challenge? What are the opportunities in developing world models for robotic applications, given their promising results in computer vision (e.g., video generation)?

Theme B. Safe Interaction in Physical Environments

– Large-scale models are trained on very diverse datasets. How can we obtain enough unsafe data to cover “all possibilities”? What type of dataset or attribute is necessary to build reliable, safe behaviours? Is current reliance on synthetic data a viable approach?
– What sensor modalities are best to achieve safe interaction? Is RGB, as many current end-to-end approaches are using, enough? Consider that RGB is abundantly available, while others, like haptic feedback, are not.
– How can we map high-level understanding of the environment into constraints and objectives that are compatible with current planning and control-theoretic frameworks? Is language a sufficient medium to provide the grounding needed to capture notions of safety in unstructured, everyday applications?
– Should we explicitly design systems to deal with uncertainty and hallucination in large-scale models? Or, do large-scale models implicitly learn to deal with this?
– How to evaluate “safe approaches”? Do policies derived from data need to be proven with even more data to gain confidence in their effectiveness for real-world deployment? How to close the gap and reach the reliability required by the industry?

Program

Below is a tentative program of the workshop; the times are in CEST. The morning and afternoon sessions each include a series of invited talks, a panel discussion, as well as short lightning talks and a poster session.

Theme A. Environment Perception and Contextual Reasoning

08:50 – 09:10 Opening Remarks and Theme A Introduction
09:10 – 09:35 Invited Talk: Marco Pavone, “Reasoning VLA Models for Vehicle Autonomy”
09:35 – 10:00 Invited Talk: Angel Chang, “Building and Reasoning with Vision-Language Maps for Embodied AI”
10:00 – 10:25 Invited Talk: Lukas Schmid, “Spatio-Temporal Reasoning over Objects and Humans”
10:25 – 10:45 Lightning Talks
10:45 – 11:15 Coffee Break and Morning Poster Session
11:15 – 11:40 Invited Talk: Janet Wiles, “Diverse Intelligences: Humans, Robots and Mycelium”
11:40 – 12:05 Invited Talk: Karol Hausman, “Language as a Connective Tissue for Robotics”
12:05 – 12:50 Theme A Panel Discussion

Theme B. Safe Interaction in Physical Environments

13:50 – 14:05 Theme B Introduction
14:05 – 14:30 Invited Talk: George Pappas, “LLM-Enabled Robots: Jailbreaking Attacks and Defenses”
14:30 – 14:55 Invited Talk: Mengdi Xu, “Building an Adaptable Generalist Robot: A Human-Centered Perspective”
14:55 – 15:15 Lightning Talks
15:15 – 15:45 Coffee Break and Afternoon Poster Session
15:45 – 16:10 Invited Talk: Ken Goldberg, “Semantic Queries of Robot Data”
16:10 – 16:35 Invited Talk: Dongheui Lee, “Semantics for Robot Task Execution Monitoring”
16:35 – 17:20 Theme B Panel Discussion
17:20 – 17:30 Concluding Remarks

Post-Workshop Social Event (from 17:30 to 18:30)

Call for Papers

We are inviting researchers from different disciplines to share novel ideas and ideas on topics pertinent to the workshop themes, which include but are not limited to:

  • Perception methods incorporating semantic, geometric, and multi-modal information
  • Efficient 3D object and environment representations from multi-modal sensor inputs
  • Uncertainty estimation for 3D perception
  • Contextual reasoning of the 3D environments (e.g., object relations, affordance, traversability)
  • Safe motion planning and control under semantic uncertainties
  • Robot skill acquisition and learning leveraging semantic information
  • Multi-agent collaboration through semantic information
  • Foundation-model-based perception and decision-making methods

The review process will be double-blind. Accepted papers will be published on the workshop webpage and will be presented as a spotlight talk or as a poster. To recognize outstanding contributions, we will present awards for Best Paper and Best Presentation towards the end of the workshop. If you have any questions, please contact us at contact.lsy@xcit.tum.de.

Paper Format

Suggested Length: minimum 2 and maximum 4 pages excluding references
Style Template: (IEEE conference paper template)

Important Dates

Initial Submission: April 1, 2026 (11:59 pm AoE)
Author Notification: May 15, 2026
Camera Ready Submission: May 20, 2026 (11:59 pm AoE)
Workshop Date: June 5, 2026

OpenReview Submission Link

Please note that submission requires an OpenReview account, and accounts without educational or institutional emails may take up to two weeks to activate.
http://tiny.cc/ICRA2026SRRA

Invited Speakers


Marco Pavone
Stanford University & Nvidia

Angel Chang
Simon Fraser University

Lukas Schmid
Massachusetts Institute of Technology

Janet Wiles
University of Queensland

Karol Hausman
Physical Intelligence (π)

George Pappas
University of Pennsylvania

Mengdi Xu
Tsinghua University

Ken Goldberg
UC Berkeley

Dongheui Lee
TU Wien & German Aerospace Center

Organizers


Angela Schoellig
TU Munich & University of Toronto

Somil Bansal
Stanford University

SiQi Zhou
Simon Fraser University

Oier Mees
Microsoft

Lukas Brunke
TU Munich & University of Toronto

Niklas Schlueter
TU Munich

Haoming Zhang
TU Munich

Sponsors



University of Toronto Institute for Aerospace Studies