IROS 2022 Safe Robot Learning Competition


Advances in robotics promise improved functionality, efficiency and quality with an impact on many aspects of our daily lives. Example applications include autonomous driving, drone delivery, and service robots. However, the decision-making of such systems often faces multiple sources of uncertainty (e.g., incomplete sensory information, uncertainties in the environment, interaction with other agents, etc.). Deploying an embodied system in real-world and possibly commercial applications requires both (i) safety guarantees that the system acts reliably in the presence of the various sources of uncertainties and (ii) efficient deployment of the decision-making algorithm to the physical world (for performance and cost-effectiveness). As highlighted in the “Roadmap for US Robotics”, learning and adaptation are essential for next-generation robotics applications, and guaranteeing safety is an integral part of this.

In our recent review paper, we have observed that there is a growing interest in developing safe robot learning approaches from both the control and reinforcement learning communities. While the safe learning control problem is equally recognized as an important problem in both communities, there is a lack of comparison or systematic head-to-head evaluation of the approaches developed by the two communities. This is often due to the different sets of assumptions made in the process of developing the decision-making algorithm (also known as “controller”). In 2021, our team organized two workshops on safe robot learning at the IROS conference and the NeurIPS conference, respectively, to foster interdisciplinary discussions on the pressing challenges related to the design and deployment of robot learning algorithms to real-world applications.

With this competition, our goal is to bring together researchers from different communities to (1) solicit novel and data-efficient robot learning algorithms, (2) establish a common forum to compare control and reinforcement learning approaches for safe robot decision making, and (3) identify the shortcomings or bottlenecks of the state-of-the-art algorithms with respect to real-world deployment.

Relevance to IROS Community

Our goal closely aligns with the theme of this year’s IROS conference–“Embodied AI for a Symbiotic Society’’. Learning will be an essential component in every aspect of the robot software stack and safety is paramount in real-world applications. Through this competition, we wish to uncover what are the essential requirements to facilitate the transfer of robot learning algorithms from simulation to the physical world and set the ground to identify future research directions that combine the expertise from the control and learning communities.

We also hope that easy-to-use and cheaply reproducible evaluation environments (e.g., through physics-based simulation) will increase accessibility to research in the area of safe robot learning and speed up its progress through quantitative apples-to-apples comparisons.

Agile and safe flight has many potential applications, including search and rescue—where agility is required to cover as much area as possible under the constraint of limited battery life while safe flight is needed to avoid catastrophic crashes or harm to humans. This competition will give teams the opportunity to showcase how these two capacities could be combined.

One of the envisioned outcomes for the competition is a collaborative paper where the top 3 teams (in each track) will describe their methodologies and the competition results (including simulation and experiment phases) will be used to illustrate the features of the different approaches and compare them.

Competition Style

The competition includes two simulation (virtual) phases and an experimental (remote) phase. The fully virtual simulation components of the competition will be based on an open-source software benchmark suite we are currently developing: safe-control-gym. In the final experimental phase of the competition (to facilitate participation amid COVID) we will provide remote access to real robotic hardware via high-speed internet connections to our Flight Arena at the University of Toronto Institute for Aerospace Studies in Toronto, Canada.

A. Simulated Environment Illustration

B. Experiment Environment Illustration


The task we consider is based on a nano quadrotor platform (Bitcraze’s Crazyflie). The quadrotor is required to navigate, as fast as possible, through an environment with a set of obstacles. The participating teams are required to show that their algorithms can safely navigate the quadrotor through increasingly cluttered environments and learn to cope with the following types of uncertainties: (1) variations/uncertainties in the obstacle positions and shapes, (2) vehicle modifications such as an added mass or payload, and (3) unforeseen aerodynamic disturbances (e.g., wind or downwash from other moving quadrotors). Safe/robust learning and adaptation approaches will be required to capture the unknown or unstructured uncertainties present in the setup. The task is considered to be unsuccessful if there is any collision with the obstacles and is unsafe if a set of safety constraints are violated (e.g., off-track maneuvers, exceeding a minimum safety distance to obstacles, exceeding a maximum completion time). The evaluation criteria will be based on the performance as measured by (1) completion time, (2) safety constraint satisfaction, and (3) the maximal clutteredness of the environment that can be accounted for by the algorithm.

Competition Tracks

Offline learning track: The environment is subject to static uncertainties (i.e., variations/uncertainties in obstacle positions and shapes, vehicle modifications such as added mass or payload). The participants will be given fixed-sized datasets that can be used to learn the uncertainties in the environment. The overall dataset will be made up of three components: (i) a dataset with the data collected from the Crazyflie platform, to learn the uncertainties in the dynamics, (ii) a dataset that captures the variations/uncertainties in the obstacle placements, and (iii) a dataset that includes possible obstacle geometries. In this track, the learning process is completed offline, before executing the tasks.

Online learning track: The environment is subject to both static uncertainties (i.e., variations/uncertainties in obstacle positions and shapes, vehicle modifications such as added mass or payload) and dynamic uncertainties (e.g., downwash from an adjacent moving quadrotor vehicle, or unforeseen wind disturbances). The teams will be given the same dataset as in the “Offline Learning Track” for learning the static uncertainties. To cope with the dynamic, unforeseen uncertainties, the teams will need to additionally design a learning or adaptive component based on minimal online information. In this track, the online learning process happens during the execution of the tasks.

Competition Phases

Preliminary phase (simulation): Registered participants are given access to the virtual platform to get familiar with the software setup. This initial test environment will be simulated, and a sufficiently accurate model of the robot and the cluttered environment will be provided for the design of navigation algorithms. The teams need to demonstrate basic navigation capabilities.

Selection phase (simulation): Registered participants will be given a more challenging environment subject to uncertainties (e.g., variations/uncertainties in the obstacle positions and shapes, vehicle modifications such as an added mass or payload, and unforeseen aerodynamic disturbances) to resemble real-world challenges. The teams are expected to demonstrate the capabilities of the robot learning algorithms to cope with the static uncertainties in the “Offline Learning Track” or both static and dynamic uncertainties in the “Online Learning Track” in increasingly challenging simulated environments (e.g., with narrower spacing and irregular geometries). The evaluation will be based on a fixed number (e.g., 5) of trials corresponding to randomly generated obstacle configurations.

Final phase (experimental): The selected teams will test the proposed algorithms in a real experimental setup (via remote access to the University of Toronto’s Flight Arena). A dataset collected from the real experimental setup will be provided to the selected teams to learn the static uncertainties in the setup, similar to the selection phase above. Our team—in Toronto and (if restrictions allow) in Kyoto—will facilitate remote access to the robot platform in Canada for both online and onsite IROS participants. Similar to the initial selection phase, the finalist teams will need to demonstrate the ability of the algorithm to cope with static uncertainties in the “Offline Learning Track” and dynamic uncertainties in the “Online Learning Track”. On the competition days, we will have two scenarios: (A) environments with obstacles and disturbances resembling those provided in the simulated environment in the selection phase and (B) environments with novel obstacles and disturbances. Each scenario will consist of a fixed number (e.g., 3) of trials corresponding to obstacle configurations with increasing difficulties.

Note that, for an algorithm that involves continuous interaction with the environment (e.g., reinforcement learning or adaptive control approaches), safety constraint violations will be checked during both the learning stage and the test stage.

Scoring System

Unlike past quadrotor slalom competitions, our proposal does not focus on the vision aspect of the problem but rather on the quantifiable safety and generalizability of learning-based control laws. To do so, we will score approaches not only on pure flight performance (shortest time, shortest distance) but also on safety (e.g., collision or near-collision under perturbed state estimation, inertial properties, obstacle positions, etc.). The overall score of the flight will be computed based on three criteria:

  • Performance: The amount of time that is required for the quadrotor to navigate through the environment.
  • Safety constraint violation: The number of violations of spatial constraints (i.e., off-track maneuvers, exceeding a minimum safety distance to obstacles) and temporal constraint (i.e., exceeding a maximum completion time).
  • Clutteredness of the environment: This criterion characterizes the difficulty of the task. This would be measured by the percentage of volume being occupied by the static and dynamic obstacles.

The quadrotors are encouraged to fly as fast as possible and at the same time not violate a set of predefined constraints (e.g., virtual tracks, closest distance with static and dynamic obstacles, and time constraints). Any violations will result in a penalty in the score. The clutteredness of the environment will be a weighting factor applied to the performance reward and the safety constraint violation penalty.

Submission system style: We will use a web application—explicitly created for this competition—to submit the source code of the learning-based solution/control approach. These controllers will be automatically executed and scored against the simulation environments as well as—for the finalist teams only—transferred to our Toronto-based team to be asynchronously run on the experimental quadrotor hardware. In case of onsite participation in Kyoto, we will set up a booth with multiple workstations to provide in-person support to the participants submitting to both the simulation and remote experimental environments.

Tentative Competition Schedule

July 1: Competition release and registration opens
July 10: Online introduction of the competition
July 10 – August 10: Competition preliminary phase
August 10 – September 10: Competition Selection phase
September 10: Registration closes
September 20: Notification of acceptance to the final phase of the competition
October 3: Data release for the competition final phase
October 5: Competition final phase information session
October 10 – October 17: Preliminary hardware testing with finalist teams
October 24 and October 25: Competition days
November 1 – December 25: Dissemination

Organizing Committee

Angela Schoellig, University of Toronto and Vector Institute
Davide Scaramuzza, University of Zurich
Nicholas Roy, Massachusetts Institute of Technology
Vijay Kumar, University of Pennsylvania
Todd Murphey, Northwestern University
Sebastian Trimpe, RWTH Aachen University
Mark Mueller, University of California Berkeley
Jose Martinez-Carranza, Instituto Nacional de Astrofisica Optica y Electronica
SiQi Zhou, University of Toronto and Vector Institute
Melissa Greeff, University of Toronto and Vector Institute
Jacopo Panerati, University of Toronto and Vector Institute
Yunlong Song, University of Zurich
Leticia Oyuki Rojas Pérez, Instituto Nacional de Astrofisica Optica y Electronica

University of Toronto Institute for Aerospace Studies