Safe and Robust Robot Learning in Unknown Environments

While learning is crucial for robots to effectively operate in unknown environments, providing safety guarantees during the learning process has recently been stated as one of the key open problem that is limiting the applicability of learning algorithms in real-world robotics applications. We address this challenge by combining robust and predictive control theory with Gaussian Process regression. Algorithms have been evaluated on ground and aerial vehicles. Another application area we work on is high-performance control for collaborative (human-robot collaborations) mobile manipulation in dynamic environments.

 

Related Publications

Multi-robot transfer learning: a dynamical system perspective
M. K. Helwa and A. P. Schoellig
in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017. Accepted.
[View BibTeX] [View Abstract] [Download PDF] [More Information]
Multi-robot transfer learning allows a robot to use data generated by a second, similar robot to improve its own behavior. The potential advantages are reducing the time of training and the unavoidable risks that exist during the training phase. Transfer learning algorithms aim to find an optimal transfer map between different robots. In this paper, we investigate, through a theoretical study of single-input single-output (SISO) systems, the properties of such optimal transfer maps. We first show that the optimal transfer learning map is, in general, a dynamic system. The main contribution of the paper is to provide an algorithm for determining the properties of this optimal dynamic map including its order and regressors (i.e., the variables it depends on). The proposed algorithm does not require detailed knowledge of the robots’ dynamics, but relies on basic system properties easily obtainable through simple experimental tests. We validate the proposed algorithm experimentally through an example of transfer learning between two different quadrotor platforms. Experimental results show that an optimal dynamic map, with correct properties obtained from our proposed algorithm, achieves 60-70% reduction of transfer learning error compared to the cases when the data is directly transferred or transferred using an optimal static map.

@INPROCEEDINGS{helwa-iros17,
author={Mohamed K. Helwa and Angela P. Schoellig},
title={Multi-Robot Transfer Learning: A Dynamical System Perspective},
booktitle={{Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}},
year={2017},
note={Accepted},
urllink={https://arxiv.org/abs/1707.08689},
abstract={Multi-robot transfer learning allows a robot to use data generated by a second, similar robot to improve its own behavior. The potential advantages are reducing the time of training and the unavoidable risks that exist during the training phase. Transfer learning algorithms aim to find an optimal transfer map between different robots. In this paper, we investigate, through a theoretical study of single-input single-output (SISO) systems, the properties of such optimal transfer maps. We first show that the optimal transfer learning map is, in general, a dynamic system. The main contribution of the paper is to provide an algorithm for determining the properties of this optimal dynamic map including its order and regressors (i.e., the variables it depends on). The proposed algorithm does not require detailed knowledge of the robots’ dynamics, but relies on basic system properties easily obtainable through simple experimental tests. We validate the proposed algorithm experimentally through an example of transfer learning between two different quadrotor platforms. Experimental results show that an optimal dynamic map, with correct properties obtained from our proposed algorithm, achieves 60-70% reduction of transfer learning error compared to the cases when the data is directly transferred or transferred using an optimal static map.},
}

[DOI] Learning multimodal models for robot dynamics online with a mixture of Gaussian process experts
C. D. McKinnon and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 322-328.
[View BibTeX] [View Abstract] [Download PDF]

For decades, robots have been essential allies alongside humans in controlled industrial environments like heavy manufacturing facilities. However, without the guidance of a trusted human operator to shepherd a robot safely through a wide range of conditions, they have been barred from the complex, ever changing environments that we live in from day to day. Safe learning control has emerged as a promising way to start bridging algorithms based on first principles to complex real-world scenarios by using data to adapt, and improve performance over time. Safe learning methods rely on a good estimate of the robot dynamics and of the bounds on modelling error in order to be effective. Current methods focus on either a single adaptive model, or a fixed, known set of models for the robot dynamics. This limits them to static or slowly changing environments. This paper presents a method using Gaussian Processes in a Dirichlet Process mixture model to learn an increasing number of non-linear models for the robot dynamics. We show that this approach enables a robot to re-use past experience from an arbitrary number of previously visited operating conditions, and to automatically learn a new model when a new and distinct operating condition is encountered. This approach improves the robustness of existing Gaussian Process-based models to large changes in dynamics that do not have to be specified ahead of time.

@INPROCEEDINGS{mckinnon-icra17,
author = {Christopher D. McKinnon and Angela P. Schoellig},
title = {Learning multimodal models for robot dynamics online with a mixture of {G}aussian process experts},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2017},
pages = {322--328},
doi = {10.1109/ICRA.2017.7989041},
abstract = {For decades, robots have been essential allies alongside humans in controlled industrial environments like heavy manufacturing facilities. However, without the guidance of a trusted human operator to shepherd a robot safely through a wide range of conditions, they have been barred from the complex, ever changing environments that we live in from day to day. Safe learning control has emerged as a promising way to start bridging algorithms based on first principles to complex real-world scenarios by using data to adapt, and improve performance over time. Safe learning methods rely on a good estimate of the robot dynamics and of the bounds on modelling error in order to be effective. Current methods focus on either a single adaptive model, or a fixed, known set of models for the robot dynamics. This limits them to static or slowly changing environments. This paper presents a method using Gaussian Processes in a Dirichlet Process mixture model to learn an increasing number of non-linear models for the robot dynamics. We show that this approach enables a robot to re-use past experience from an arbitrary number of previously visited operating conditions, and to automatically learn a new model when a new and distinct operating condition is encountered. This approach improves the robustness of existing Gaussian Process-based models to large changes in dynamics that do not have to be specified ahead of time.},
}

[DOI] Conservative to confident: treating uncertainty robustly within learning-based control
C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 421-427.
[View BibTeX] [View Abstract] [Download PDF]

Robust control maintains stability and performance for a fixed amount of model uncertainty but can be conservative since the model is not updated online. Learning- based control, on the other hand, uses data to improve the model over time but is not typically guaranteed to be robust throughout the process. This paper proposes a novel combination of both ideas: a robust Min-Max Learning-Based Nonlinear Model Predictive Control (MM-LB-NMPC) algorithm. Based on an existing LB-NMPC algorithm, we present an efficient and robust extension, altering the NMPC performance objective to optimize for the worst-case scenario. The algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) based on experience collected during previous trials as a function of system state, input, and other relevant variables. Nominal state sequences are predicted using an Unscented Transform and worst-case scenarios are defined as sequences bounding the 3σ confidence region. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results from testing on a 50 kg skid-steered robot executing a path-tracking task. The results show reductions in maximum lateral and heading path-tracking errors by up to 30% and a clear transition from robust control when the model uncertainty is high to optimal control when model uncertainty is reduced.

@INPROCEEDINGS{ostafew-icra15,
author = {Chris J. Ostafew and Angela P. Schoellig and Timothy D. Barfoot},
title = {Conservative to confident: treating uncertainty robustly within learning-based control},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
pages = {421--427},
year = {2015},
doi = {10.1109/ICRA.2015.7139033},
note = {},
abstract = {Robust control maintains stability and performance for a fixed amount of model uncertainty but can be conservative since the model is not updated online. Learning- based control, on the other hand, uses data to improve the model over time but is not typically guaranteed to be robust throughout the process. This paper proposes a novel combination of both ideas: a robust Min-Max Learning-Based Nonlinear Model Predictive Control (MM-LB-NMPC) algorithm. Based on an existing LB-NMPC algorithm, we present an efficient and robust extension, altering the NMPC performance objective to optimize for the worst-case scenario. The algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) based on experience collected during previous trials as a function of system state, input, and other relevant variables. Nominal state sequences are predicted using an Unscented Transform and worst-case scenarios are defined as sequences bounding the 3σ confidence region. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results from testing on a 50 kg skid-steered robot executing a path-tracking task. The results show reductions in maximum lateral and heading path-tracking errors by up to 30% and a clear transition from robust control when the model uncertainty is high to optimal control when model uncertainty is reduced.}
}

[DOI] Safe and robust learning control with Gaussian processes
F. Berkenkamp and A. P. Schoellig
in Proc. of the European Control Conference (ECC), 2015, pp. 2501-2506.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Download Slides]

This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle.

@INPROCEEDINGS{berkenkamp-ecc15,
author = {Felix Berkenkamp and Angela P. Schoellig},
title = {Safe and robust learning control with {G}aussian processes},
booktitle = {{Proc. of the European Control Conference (ECC)}},
pages = {2501--2506},
year = {2015},
doi = {10.1109/ECC.2015.7330913},
urlvideo={https://youtu.be/YqhLnCm0KXY?list=PLC12E387419CEAFF2},
urlslides={../../wp-content/papercite-data/slides/berkenkamp-ecc15-slides.pdf},
abstract = {This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle.}
}

University of Toronto Institute for Aerospace Studies