Safe and Robust Robot Learning in Unknown Environments

Learning can be used to improve the performance of a robotic system in a complex environment. However, providing safety guarantees during the learning process is one of the key challenges that prevents these algorithms from being applied to real world applications. We aim to address this by combining robust and predictive control theory with probabilistic modelling techniques such as Gaussian Process Regression. In addition, some of our recent work has focused on enabling these systems adapt safely to new conditions that arise as a natural part of deploying robots in realistic outdoor environments for long periods of time. Our algorithms have been evaluated on ground and aerial vehicles as well as on mobile manipulators for human-robot collaboration.

 

Related Publications

[DOI] Context-aware cost shaping to reduce the impact of model error in safe, receding horizon control
C. D. McKinnon and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 2386-2392.
[View BibTeX] [View Abstract] [Download PDF] [View Video]

This paper presents a method to enable a robot using stochastic Model Predictive Control (MPC) to achieve high performance on a repetitive path-following task. In particular, we consider the case where the accuracy of the model for robot dynamics varies significantly over the path–motivated by the fact that the models used in MPC must be computationally efficient, which limits their expressive power. Our approach is based on correcting the cost predicted using a simple learned dynamics model over the MPC horizon. This discourages the controller from taking actions that lead to higher cost than would have been predicted using the dynamics model. In addition, stochastic MPC provides a quantitative measure of safety by limiting the probability of violating state and input constraints over the prediction horizon. Our approach is unique in that it combines both online model learning and cost learning over the prediction horizon and is geared towards operating a robot in changing conditions. We demonstrate our algorithm in simulation and experiment on a ground robot that uses a stereo camera for localization.

@INPROCEEDINGS{mckinnon-icra20,
title = {Context-aware Cost Shaping to Reduce the Impact of Model Error in Safe, Receding Horizon Control},
author = {Christopher D. McKinnon and Angela P. Schoellig},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2020},
pages = {2386-2392},
doi = {10.1109/ICRA40945.2020.9197521},
urlvideo = {https://youtu.be/xrgcO2-A9bo},
abstract = {This paper presents a method to enable a robot using stochastic Model Predictive Control (MPC) to achieve high performance on a repetitive path-following task. In particular, we consider the case where the accuracy of the model for robot dynamics varies significantly over the path–motivated by the fact that the models used in MPC must be computationally efficient, which limits their expressive power. Our approach is based on correcting the cost predicted using a simple learned dynamics model over the MPC horizon. This discourages the controller from taking actions that lead to higher cost than would have been predicted using the dynamics model. In addition, stochastic MPC provides a quantitative measure of safety by limiting the probability of violating state and input constraints over the prediction horizon. Our approach is unique in that it combines both online model learning and cost learning over the prediction horizon and is geared towards operating a robot in changing conditions. We demonstrate our algorithm in simulation and experiment on a ground robot that uses a stereo camera for localization.}
}

[DOI] Provably robust learning-based approach for high-accuracy tracking control of Lagrangian systems
M. K. Helwa, A. Heins, and A. P. Schoellig
IEEE Robotics and Automation Letters, vol. 4, iss. 2, p. 1587–1594, 2019.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

Lagrangian systems represent a wide range of robotic systems, including manipulators, wheeled and legged robots, and quadrotors. Inverse dynamics control and feed-forward linearization techniques are typically used to convert the complex nonlinear dynamics of Lagrangian systems to a set of decoupled double integrators, and then a standard, outer-loop controller can be used to calculate the commanded acceleration for the linearized system. However, these methods typically depend on having a very accurate system model, which is often not available in practice. While this challenge has been addressed in the literature using different learning approaches, most of these approaches do not provide safety guarantees in terms of stability of the learning-based control system. In this paper, we provide a novel, learning-based control approach based on Gaussian processes (GPs) that ensures both stability of the closed-loop system and high-accuracy tracking. We use GPs to approximate the error between the commanded acceleration and the actual acceleration of the system, and then use the predicted mean and variance of the GP to calculate an upper bound on the uncertainty of the linearized model. This uncertainty bound is then used in a robust, outer-loop controller to ensure stability of the overall system. Moreover, we show that the tracking error converges to a ball with a radius that can be made arbitrarily small. Furthermore, we verify the effectiveness of our approach via simulations on a 2 degree-of-freedom (DOF) planar manipulator and experimentally on a 6 DOF industrial manipulator.

@article{helwa-ral19,
title = {Provably Robust Learning-Based Approach for High-Accuracy Tracking Control of {L}agrangian Systems},
author = {Mohamed K. Helwa and Adam Heins and Angela P. Schoellig},
journal = {{IEEE Robotics and Automation Letters}},
year = {2019},
volume = {4},
number = {2},
pages = {1587--1594},
doi = {10.1109/LRA.2019.2896728},
urllink = {https://arxiv.org/pdf/1804.01031.pdf},
urlvideo = {https://youtu.be/CBmZ4F79gmI},
abstract = {Lagrangian systems represent a wide range of robotic systems, including manipulators, wheeled and legged robots, and quadrotors. Inverse dynamics control and feed-forward linearization techniques are typically used to convert the complex nonlinear dynamics of Lagrangian systems to a set of decoupled double integrators, and then a standard, outer-loop controller can be used to calculate the commanded acceleration for the linearized system. However, these methods typically depend on having a very accurate system model, which is often not available in practice. While this challenge has been addressed in the literature using different learning approaches, most of these approaches do not provide safety guarantees in terms of stability of the learning-based control system. In this paper, we provide a novel, learning-based control approach based on Gaussian processes (GPs) that ensures both stability of the closed-loop system and high-accuracy tracking. We use GPs to approximate the error between the commanded acceleration and the actual acceleration of the system, and then use the predicted mean and variance of the GP to calculate an upper bound on the uncertainty of the linearized model. This uncertainty bound is then used in a robust, outer-loop controller to ensure stability of the overall system. Moreover, we show that the tracking error converges to a ball with a radius that can be made arbitrarily small. Furthermore, we verify the effectiveness of our approach via simulations on a 2 degree-of-freedom (DOF) planar manipulator and experimentally on a 6 DOF industrial manipulator.}
}

Learn fast, forget slow: safe predictive control for systems with locally linear actuator dynamics performing repetitive tasks
C. D. McKinnon and A. P. Schoellig
IEEE Robotics and Automation Letters, vol. 4, iss. 2, p. 2180–2187, 2019.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

We present a control method for improved repetitive path following for a ground vehicle that is geared towards long-term operation where the operating conditions can change over time and are initially unknown. We use weighted Bayesian Linear Regression to model the unknown actuator dynamics, and show how this simple model is more accurate in both its estimate of the mean behaviour and model uncertainty than Gaussian Process Regression and generalizes to novel operating conditions with little or no tuning. In addition, it allows us to use fast adaptation and long-term learning in one, unified framework, to adapt quickly to new operating conditions and learn repetitive model errors over time. This comes with the added benefit of lower computational cost, longer look-ahead, and easier optimization when the model is used in a robust, Model Predictive controller (MPC). In order to fully capitalize on the long prediction horizons that are possible with this new approach, we use Tube MPC to reduce predicted uncertainty growth. We demonstrate the effectiveness of our approach in experiment on a 900 kg ground robot showing results over 2.7 km of driving with both physical and artificial changes to the robot’s dynamics. All of our experiments are conducted using a stereo camera for localization.

@article{mckinnon-ral19,
title={Learn Fast, Forget Slow: Safe Predictive Control for Systems with Locally Linear Actuator Dynamics Performing Repetitive Tasks},
author={Christopher D. McKinnon and Angela P. Schoellig},
journal = {{IEEE Robotics and Automation Letters}},
year = {2019},
volume = {4},
number = {2},
pages = {2180--2187},
urllink={https://arxiv.org/abs/1810.06681},
urlvideo={https://youtu.be/fLNMtYabuU4},
abstract = {We present a control method for improved repetitive path following for a ground vehicle that is geared towards long-term operation where the operating conditions can change over time and are initially unknown. We use weighted Bayesian Linear Regression to model the unknown actuator dynamics, and show how this simple model is more accurate in both its estimate of the mean behaviour and model uncertainty than Gaussian Process Regression and generalizes to novel operating conditions with little or no tuning. In addition, it allows us to use fast adaptation and long-term learning in one, unified framework, to adapt quickly to new operating conditions and learn repetitive model errors over time. This comes with the added benefit of lower computational cost, longer look-ahead, and easier optimization when the model is used in a robust, Model Predictive controller (MPC). In order to fully capitalize on the long prediction horizons that are possible with this new approach, we use Tube MPC to reduce predicted uncertainty growth. We demonstrate the effectiveness of our approach in experiment on a 900 kg ground robot showing results over 2.7 km of driving with both physical and artificial changes to the robot's dynamics. All of our experiments are conducted using a stereo camera for localization.}
}

Learning probabilistic models for safe predictive control in unknown environments
C. D. McKinnon and A. P. Schoellig
in Proc. of the European Control Conference (ECC), 2019, p. 2472–2479.
[View BibTeX] [View Abstract] [Download PDF]

Researchers rely increasingly on tools from machine learning to improve the performance of control algorithms on real world tasks and enable robots to operate for long periods of time without intervention. Many of these algorithms require a model for the dynamics of the robot. In particular, researchers designing methods for safe learning control often rely on an upper bound on model error to make guarantees about the worst-case closed-loop performance of their algorithm. There are different options for how to learn such a model of the robot dynamics. We study probabilistic models for use in the context of stochastic model predictive control. Two popular choices for learning the robot dynamics are Gaussian Process (GP) regression and various forms of local linear regression. In this paper, we present a study comparing GPs with a particular form of local linear regression for learning robot dynamics with the aim of guaranteeing safety when a robot operates in novel conditions. We show results based on experimental data from a 900 kg ground robot using vision for localisation.

@INPROCEEDINGS{mckinnon-ecc19,
author = {Christopher D. McKinnon and Angela P. Schoellig},
title = {Learning Probabilistic Models for Safe Predictive Control in Unknown Environments},
booktitle = {{Proc. of the European Control Conference (ECC)}},
year = {2019},
pages = {2472--2479},
abstract = {Researchers rely increasingly on tools from machine learning to improve the performance of control algorithms on real world tasks and enable robots to operate for long periods of time without intervention. Many of these algorithms require a model for the dynamics of the robot. In particular, researchers designing methods for safe learning control often rely on an upper bound on model error to make guarantees about the worst-case closed-loop performance of their algorithm. There are different options for how to learn such a model of the robot dynamics. We study probabilistic models for use in the context of stochastic model predictive control. Two popular choices for learning the robot dynamics are Gaussian Process (GP) regression and various forms of local linear regression. In this paper, we present a study comparing GPs with a particular form of local linear regression for learning robot dynamics with the aim of guaranteeing safety when a robot operates in novel conditions. We show results based on experimental data from a 900 kg ground robot using vision for localisation.},
}

Safe model-based reinforcement learning with stability guarantees
F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause
in Proc. of Neural Information Processing Systems (NIPS), 2017, p. 908–918.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety in terms of stability guarantees. Specifically, we extend control theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.

@INPROCEEDINGS{berkenkamp-nips17,
title = {Safe model-based reinforcement learning with stability guarantees},
booktitle = {{Proc. of Neural Information Processing Systems (NIPS)}},
author = {Felix Berkenkamp and Matteo Turchetta and Angela P. Schoellig and Andreas Krause},
year = {2017},
urllink = {https://arxiv.org/abs/1705.08551},
pages = {908--918},
abstract = {Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety in terms of stability guarantees. Specifically, we extend control theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.},
}

[DOI] Multi-robot transfer learning: a dynamical system perspective
M. K. Helwa and A. P. Schoellig
in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 4702-4708.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

Multi-robot transfer learning allows a robot to use data generated by a second, similar robot to improve its own behavior. The potential advantages are reducing the time of training and the unavoidable risks that exist during the training phase. Transfer learning algorithms aim to find an optimal transfer map between different robots. In this paper, we investigate, through a theoretical study of single-input single-output (SISO) systems, the properties of such optimal transfer maps. We first show that the optimal transfer learning map is, in general, a dynamic system. The main contribution of the paper is to provide an algorithm for determining the properties of this optimal dynamic map including its order and regressors (i.e., the variables it depends on). The proposed algorithm does not require detailed knowledge of the robots’ dynamics, but relies on basic system properties easily obtainable through simple experimental tests. We validate the proposed algorithm experimentally through an example of transfer learning between two different quadrotor platforms. Experimental results show that an optimal dynamic map, with correct properties obtained from our proposed algorithm, achieves 60-70% reduction of transfer learning error compared to the cases when the data is directly transferred or transferred using an optimal static map.

@INPROCEEDINGS{helwa-iros17,
author={Mohamed K. Helwa and Angela P. Schoellig},
title={Multi-Robot Transfer Learning: A Dynamical System Perspective},
booktitle={{Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}},
year={2017},
pages={4702-4708},
doi={10.1109/IROS.2017.8206342},
urllink={https://arxiv.org/abs/1707.08689},
abstract={Multi-robot transfer learning allows a robot to use data generated by a second, similar robot to improve its own behavior. The potential advantages are reducing the time of training and the unavoidable risks that exist during the training phase. Transfer learning algorithms aim to find an optimal transfer map between different robots. In this paper, we investigate, through a theoretical study of single-input single-output (SISO) systems, the properties of such optimal transfer maps. We first show that the optimal transfer learning map is, in general, a dynamic system. The main contribution of the paper is to provide an algorithm for determining the properties of this optimal dynamic map including its order and regressors (i.e., the variables it depends on). The proposed algorithm does not require detailed knowledge of the robots’ dynamics, but relies on basic system properties easily obtainable through simple experimental tests. We validate the proposed algorithm experimentally through an example of transfer learning between two different quadrotor platforms. Experimental results show that an optimal dynamic map, with correct properties obtained from our proposed algorithm, achieves 60-70% reduction of transfer learning error compared to the cases when the data is directly transferred or transferred using an optimal static map.},
}

Design of deep neural networks as add-on blocks for improving impromptu trajectory tracking
S. Zhou, M. K. Helwa, and A. P. Schoellig
in Proc. of the IEEE Conference on Decision and Control (CDC), 2017, p. 5201–5207.
[View BibTeX] [View Abstract] [Download PDF] [More Information]

This paper introduces deep neural networks (DNNs) as add-on blocks to baseline feedback control systems to enhance tracking performance of arbitrary desired trajectories. The DNNs are trained to adapt the reference signals to the feedback control loop. The goal is to achieve a unity map between the desired and the actual outputs. In previous work, the efficacy of this approach was demonstrated on quadrotors; on 30 unseen test trajectories, the proposed DNN approach achieved an average impromptu tracking error reduction of 43% as compared to the baseline feedback controller. Motivated by these results, this work aims to provide platform-independent design guidelines for the proposed DNN-enhanced control architecture. In particular, we provide specific guidelines for the DNN feature selection, derive conditions for when the proposed approach is effective, and show in which cases the training efficiency can be further increased.

@INPROCEEDINGS{zhou-cdc17,
author={SiQi Zhou and Mohamed K. Helwa and Angela P. Schoellig},
title={Design of Deep Neural Networks as Add-on Blocks for Improving Impromptu Trajectory Tracking},
booktitle = {{Proc. of the IEEE Conference on Decision and Control (CDC)}},
year = {2017},
pages={5201--5207},
urllink = {https://arxiv.org/pdf/1705.10932.pdf},
abstract = {This paper introduces deep neural networks (DNNs) as add-on blocks to baseline feedback control systems to enhance tracking performance of arbitrary desired trajectories. The DNNs are trained to adapt the reference signals to the feedback control loop. The goal is to achieve a unity map between the desired and the actual outputs. In previous work, the efficacy of this approach was demonstrated on quadrotors; on 30 unseen test trajectories, the proposed DNN approach achieved an average impromptu tracking error reduction of 43% as compared to the baseline feedback controller. Motivated by these results, this work aims to provide platform-independent design guidelines for the proposed DNN-enhanced control architecture. In particular, we provide specific guidelines for the DNN feature selection, derive conditions for when the proposed approach is effective, and show in which cases the training efficiency can be further increased.}
}

Constrained Bayesian optimization with particle swarms for safe adaptive controller tuning
R. R. P. R. Duivenvoorden, F. Berkenkamp, N. Carion, A. Krause, and A. P. Schoellig
in Proc. of the IFAC (International Federation of Automatic Control) World Congress, 2017, p. 12306–12313.
[View BibTeX] [View Abstract] [Download PDF]

Tuning controller parameters is a recurring and time-consuming problem in control. This is especially true in the field of adaptive control, where good performance is typically only achieved after significant tuning. Recently, it has been shown that constrained Bayesian optimization is a promising approach to automate the tuning process without risking system failures during the optimization process. However, this approach is computationally too expensive for tuning more than a couple of parameters. In this paper, we provide a heuristic in order to efficiently perform constrained Bayesian optimization in high-dimensional parameter spaces by using an adaptive discretization based on particle swarms. We apply the method to the tuning problem of an L1 adaptive controller on a quadrotor vehicle and show that we can reliably and automatically tune parameters in experiments.

@INPROCEEDINGS{duivenvoorden-ifac17,
author = {Rikky R.P.R. Duivenvoorden and Felix Berkenkamp and Nicolas Carion and Andreas Krause and Angela P. Schoellig},
title = {Constrained {B}ayesian Optimization with Particle Swarms for Safe Adaptive Controller Tuning},
booktitle = {{Proc. of the IFAC (International Federation of Automatic Control) World Congress}},
year = {2017},
pages = {12306--12313},
abstract = {Tuning controller parameters is a recurring and time-consuming problem in control. This is especially true in the field of adaptive control, where good performance is typically only achieved after significant tuning. Recently, it has been shown that constrained Bayesian optimization is a promising approach to automate the tuning process without risking system failures during the optimization process. However, this approach is computationally too expensive for tuning more than a couple of parameters. In this paper, we provide a heuristic in order to efficiently perform constrained Bayesian optimization in high-dimensional parameter spaces by using an adaptive discretization based on particle swarms. We apply the method to the tuning problem of an L1 adaptive controller on a quadrotor vehicle and show that we can reliably and automatically tune parameters in experiments.},
}

[DOI] Learning multimodal models for robot dynamics online with a mixture of Gaussian process experts
C. D. McKinnon and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2017, p. 322–328.
[View BibTeX] [View Abstract] [Download PDF]

For decades, robots have been essential allies alongside humans in controlled industrial environments like heavy manufacturing facilities. However, without the guidance of a trusted human operator to shepherd a robot safely through a wide range of conditions, they have been barred from the complex, ever changing environments that we live in from day to day. Safe learning control has emerged as a promising way to start bridging algorithms based on first principles to complex real-world scenarios by using data to adapt, and improve performance over time. Safe learning methods rely on a good estimate of the robot dynamics and of the bounds on modelling error in order to be effective. Current methods focus on either a single adaptive model, or a fixed, known set of models for the robot dynamics. This limits them to static or slowly changing environments. This paper presents a method using Gaussian Processes in a Dirichlet Process mixture model to learn an increasing number of non-linear models for the robot dynamics. We show that this approach enables a robot to re-use past experience from an arbitrary number of previously visited operating conditions, and to automatically learn a new model when a new and distinct operating condition is encountered. This approach improves the robustness of existing Gaussian Process-based models to large changes in dynamics that do not have to be specified ahead of time.

@INPROCEEDINGS{mckinnon-icra17,
author = {Christopher D. McKinnon and Angela P. Schoellig},
title = {Learning multimodal models for robot dynamics online with a mixture of {G}aussian process experts},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2017},
pages = {322--328},
doi = {10.1109/ICRA.2017.7989041},
abstract = {For decades, robots have been essential allies alongside humans in controlled industrial environments like heavy manufacturing facilities. However, without the guidance of a trusted human operator to shepherd a robot safely through a wide range of conditions, they have been barred from the complex, ever changing environments that we live in from day to day. Safe learning control has emerged as a promising way to start bridging algorithms based on first principles to complex real-world scenarios by using data to adapt, and improve performance over time. Safe learning methods rely on a good estimate of the robot dynamics and of the bounds on modelling error in order to be effective. Current methods focus on either a single adaptive model, or a fixed, known set of models for the robot dynamics. This limits them to static or slowly changing environments. This paper presents a method using Gaussian Processes in a Dirichlet Process mixture model to learn an increasing number of non-linear models for the robot dynamics. We show that this approach enables a robot to re-use past experience from an arbitrary number of previously visited operating conditions, and to automatically learn a new model when a new and distinct operating condition is encountered. This approach improves the robustness of existing Gaussian Process-based models to large changes in dynamics that do not have to be specified ahead of time.},
}

[DOI] Deep neural networks for improved, impromptu trajectory tracking of quadrotors
Q. Li, J. Qian, Z. Zhu, X. Bao, M. K. Helwa, and A. P. Schoellig
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2017, p. 5183–5189.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [More Information]

Trajectory tracking control for quadrotors is important for applications ranging from surveying and inspection, to film making. However, designing and tuning classical controllers, such as proportional-integral-derivative (PID) controllers, to achieve high tracking precision can be time-consuming and difficult, due to hidden dynamics and other non-idealities. The Deep Neural Network (DNN), with its superior capability of approximating abstract, nonlinear functions, proposes a novel approach for enhancing trajectory tracking control. This paper presents a DNN-based algorithm as an add-on module that improves the tracking performance of a classical feedback controller. Given a desired trajectory, the DNNs provide a tailored reference input to the controller based on their gained experience. The input aims to achieve a unity map between the desired and the output trajectory. The motivation for this work is an interactive �fly-as-you-draw� application, in which a user draws a trajectory on a mobile device, and a quadrotor instantly flies that trajectory with the DNN-enhanced control system. Experimental results demonstrate that the proposed approach improves the tracking precision for user-drawn trajectories after the DNNs are trained on selected periodic trajectories, suggesting the method�s potential in real-world applications. Tracking errors are reduced by around 40-50% for both training and testing trajectories from users, highlighting the DNNs� capability of generalizing knowledge.

@INPROCEEDINGS{li-icra17,
author = {Qiyang Li and Jingxing Qian and Zining Zhu and Xuchan Bao and Mohamed K. Helwa and Angela P. Schoellig},
title = {Deep neural networks for improved, impromptu trajectory tracking of quadrotors},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2017},
pages = {5183--5189},
doi = {10.1109/ICRA.2017.7989607},
urllink = {https://arxiv.org/abs/1610.06283},
urlvideo = {https://youtu.be/r1WnMUZy9-Y},
abstract = {Trajectory tracking control for quadrotors is important for applications ranging from surveying and inspection, to film making. However, designing and tuning classical controllers, such as proportional-integral-derivative (PID) controllers, to achieve high tracking precision can be time-consuming and difficult, due to hidden dynamics and other non-idealities. The Deep Neural Network (DNN), with its superior capability of approximating abstract, nonlinear functions, proposes a novel approach for enhancing trajectory tracking control. This paper presents a DNN-based algorithm as an add-on module that improves the tracking performance of a classical feedback controller. Given a desired trajectory, the DNNs provide a tailored reference input to the controller based on their gained experience. The input aims to achieve a unity map between the desired and the output trajectory. The motivation for this work is an interactive �fly-as-you-draw� application, in which a user draws a trajectory on a mobile device, and a quadrotor instantly flies that trajectory with the DNN-enhanced control system. Experimental results demonstrate that the proposed approach improves the tracking precision for user-drawn trajectories after the DNNs are trained on selected periodic trajectories, suggesting the method�s potential in real-world applications. Tracking errors are reduced by around 40-50% for both training and testing trajectories from users, highlighting the DNNs� capability of generalizing knowledge.},
}

[DOI] Safe controller optimization for quadrotors with Gaussian processes
F. Berkenkamp, A. P. Schoellig, and A. Krause
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2016, p. 491–496.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [View 2nd Video] [Code] [More Information]

One of the most fundamental problems when designing controllers for dynamic systems is the tuning of the controller parameters. Typically, a model of the system is used to obtain an initial controller, but ultimately the controller parameters must be tuned manually on the real system to achieve the best performance. To avoid this manual tuning step, methods from machine learning, such as Bayesian optimization, have been used. However, as these methods evaluate different controller parameters on the real system, safety-critical system failures may happen. In this paper, we overcome this problem by applying, for the first time, a recently developed safe optimization algorithm, SafeOpt, to the problem of automatic controller parameter tuning. Given an initial, low-performance controller, SafeOpt automatically optimizes the parameters of a control law while guaranteeing safety. It models the underlying performance measure as a Gaussian process and only explores new controller parameters whose performance lies above a safe performance threshold with high probability. Experimental results on a quadrotor vehicle indicate that the proposed method enables fast, automatic, and safe optimization of controller parameters without human intervention.

@INPROCEEDINGS{berkenkamp-icra16,
author = {Felix Berkenkamp and Angela P. Schoellig and Andreas Krause},
title = {Safe controller optimization for quadrotors with {G}aussian processes},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
year = {2016},
month = {May},
pages = {491--496},
doi = {10.1109/ICRA.2016.7487170},
urllink = {http://arxiv.org/abs/1509.01066},
urlvideo = {https://www.youtube.com/watch?v=GiqNQdzc5TI},
urlvideo2 = {https://www.youtube.com/watch?v=IYi8qMnt0yU},
urlcode = {https://github.com/befelix/SafeOpt},
abstract = {One of the most fundamental problems when designing controllers for dynamic systems is the tuning of the controller parameters. Typically, a model of the system is used to obtain an initial controller, but ultimately the controller parameters must be tuned manually on the real system to achieve the best performance. To avoid this manual tuning step, methods from machine learning, such as Bayesian optimization, have been used. However, as these methods evaluate different controller parameters on the real system, safety-critical system failures may happen. In this paper, we overcome this problem by applying, for the first time, a recently developed safe optimization algorithm, SafeOpt, to the problem of automatic controller parameter tuning. Given an initial, low-performance controller, SafeOpt automatically optimizes the parameters of a control law while guaranteeing safety. It models the underlying performance measure as a Gaussian process and only explores new controller parameters whose performance lies above a safe performance threshold with high probability. Experimental results on a quadrotor vehicle indicate that the proposed method enables fast, automatic, and safe optimization of controller parameters without human intervention.},
}

[DOI] Robust constrained learning-based NMPC enabling reliable mobile robot path tracking
C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot
International Journal of Robotics Research, vol. 35, iss. 13, pp. 1547-1563, 2016.
[View BibTeX] [View Abstract] [Download PDF] [View Video]

This paper presents a Robust Constrained Learning-based Nonlinear Model Predictive Control (RC-LB-NMPC) algorithm for path-tracking in off-road terrain. For mobile robots, constraints may represent solid obstacles or localization limits. As a result, constraint satisfaction is required for safety. Constraint satisfaction is typically guaranteed through the use of accurate, a priori models or robust control. However, accurate models are generally not available for off-road operation. Furthermore, robust controllers are often conservative, since model uncertainty is not updated online. In this work our goal is to use learning to generate low-uncertainty, non-parametric models in situ. Based on these models, the predictive controller computes both linear and angular velocities in real-time, such that the robot drives at or near its capabilities while respecting path and localization constraints. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, off-road environments. The paper presents experimental results, including over 5 km of travel by a 900 kg skid-steered robot at speeds of up to 2.0 m/s. The result is a robust, learning controller that provides safe, conservative control during initial trials when model uncertainty is high and converges to high-performance, optimal control during later trials when model uncertainty is reduced with experience.

@ARTICLE{ostafew-ijrr16,
author = {Chris J. Ostafew and Angela P. Schoellig and Timothy D. Barfoot},
title = {Robust Constrained Learning-Based {NMPC} Enabling Reliable Mobile Robot Path Tracking},
year = {2016},
journal = {{International Journal of Robotics Research}},
volume = {35},
number = {13},
pages = {1547-1563},
doi = {10.1177/0278364916645661},
url = {http://dx.doi.org/10.1177/0278364916645661},
eprint = {http://dx.doi.org/10.1177/0278364916645661},
urlvideo = {https://youtu.be/3xRNmNv5Efk},
abstract = {This paper presents a Robust Constrained Learning-based Nonlinear Model Predictive Control (RC-LB-NMPC) algorithm for path-tracking in off-road terrain. For mobile robots, constraints may represent solid obstacles or localization limits. As a result, constraint satisfaction is required for safety. Constraint satisfaction is typically guaranteed through the use of accurate, a priori models or robust control. However, accurate models are generally not available for off-road operation. Furthermore, robust controllers are often conservative, since model uncertainty is not updated online. In this work our goal is to use learning to generate low-uncertainty, non-parametric models in situ. Based on these models, the predictive controller computes both linear and angular velocities in real-time, such that the robot drives at or near its capabilities while respecting path and localization constraints. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, off-road environments. The paper presents experimental results, including over 5 km of travel by a 900 kg skid-steered robot at speeds of up to 2.0 m/s. The result is a robust, learning controller that provides safe, conservative control during initial trials when model uncertainty is high and converges to high-performance, optimal control during later trials when model uncertainty is reduced with experience.},
}

[DOI] Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes
F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause
in Proc. of the IEEE Conference on Decision and Control (CDC), 2016, pp. 4661-4666.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Code] [Code 2] [Download Slides] [More Information]

Control theory can provide useful insights into the properties of controlled, dynamic systems. One important property of nonlinear systems is the region of attraction (ROA), a safe subset of the state space in which a given controller renders an equilibrium point asymptotically stable. The ROA is typically estimated based on a model of the system. However, since models are only an approximation of the real world, the resulting estimated safe region can contain states outside the ROA of the real system. This is not acceptable in safety-critical applications. In this paper, we consider an approach that learns the ROA from experiments on a real system, without ever leaving the true ROA and, thus, without risking safety-critical failures. Based on regularity assumptions on the model errors in terms of a Gaussian process prior, we use an underlying Lyapunov function in order to determine a region in which an equilibrium point is asymptotically stable with high probability. Moreover, we provide an algorithm to actively and safely explore the state space in order to expand the ROA estimate. We demonstrate the effectiveness of this method in simulation.

@INPROCEEDINGS{berkenkamp-cdc16,
author = {Felix Berkenkamp and Riccardo Moriconi and Angela P. Schoellig and Andreas Krause},
title = {Safe learning of regions of attraction for uncertain, nonlinear systems with {G}aussian processes},
booktitle = {{Proc. of the IEEE Conference on Decision and Control (CDC)}},
year = {2016},
pages = {4661-4666},
doi = {10.1109/CDC.2016.7798979},
urllink = {http://arxiv.org/abs/1603.04915},
urlvideo = {https://youtu.be/bSv-pNOWn7c},
urlslides={../../wp-content/papercite-data/slides/berkenkamp-cdc16-slides.pdf},
urlcode = {https://github.com/befelix/lyapunov-learning},
urlcode2 = {http://berkenkamp.me/jupyter/lyapunov},
abstract = {Control theory can provide useful insights into the properties of controlled, dynamic systems. One important property of nonlinear systems is the region of attraction (ROA), a safe subset of the state space in which a given controller renders an equilibrium point asymptotically stable. The ROA is typically estimated based on a model of the system. However, since models are only an approximation of the real world, the resulting estimated safe region can contain states outside the ROA of the real system. This is not acceptable in safety-critical applications. In this paper, we consider an approach that learns the ROA from experiments on a real system, without ever leaving the true ROA and, thus, without risking safety-critical failures. Based on regularity assumptions on the model errors in terms of a Gaussian process prior, we use an underlying Lyapunov function in order to determine a region in which an equilibrium point is asymptotically stable with high probability. Moreover, we provide an algorithm to actively and safely explore the state space in order to expand the ROA estimate. We demonstrate the effectiveness of this method in simulation.}
}

Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics
F. Berkenkamp, A. Krause, and A. P. Schoellig
Technical Report, arXiv, 2016.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Code] [More Information]

Robotic algorithms typically depend on various parameters, the choice of which significantly affects the robot’s performance. While an initial guess for the parameters may be obtained from dynamic models of the robot, parameters are usually tuned manually on the real system to achieve the best performance. Optimization algorithms, such as Bayesian optimization, have been used to automate this process. However, these methods may evaluate unsafe parameters during the optimization process that lead to safety-critical system failures. Recently, a safe Bayesian optimization algorithm, called SafeOpt, has been developed, which guarantees that the performance of the system never falls below a critical value; that is, safety is defined based on the performance function. However, coupling performance and safety is often not desirable in robotics. For example, high-gain controllers might achieve low average tracking error (performance), but can overshoot and violate input constraints. In this paper, we present a generalized algorithm that allows for multiple safety constraints separate from the objective. Given an initial set of safe parameters, the algorithm maximizes performance but only evaluates parameters that satisfy safety for all constraints with high probability. To this end, it carefully explores the parameter space by exploiting regularity assumptions in terms of a Gaussian process prior. Moreover, we show how context variables can be used to safely transfer knowledge to new situations and tasks. We provide a theoretical analysis and demonstrate that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters in experiments on a quadrotor vehicle.

@TECHREPORT{berkenkamp-tr16,
title = {Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics},
institution = {arXiv},
author = {Berkenkamp, Felix and Krause, Andreas and Schoellig, Angela P.},
year = {2016},
urllink = {http://arxiv.org/abs/1602.04450},
urlvideo = {https://youtu.be/GiqNQdzc5TI},
urlcode = {https://github.com/befelix/SafeOpt},
abstract = {Robotic algorithms typically depend on various parameters, the choice of which significantly affects the robot's performance. While an initial guess for the parameters may be obtained from dynamic models of the robot, parameters are usually tuned manually on the real system to achieve the best performance. Optimization algorithms, such as Bayesian optimization, have been used to automate this process. However, these methods may evaluate unsafe parameters during the optimization process that lead to safety-critical system failures. Recently, a safe Bayesian optimization algorithm, called SafeOpt, has been developed, which guarantees that the performance of the system never falls below a critical value; that is, safety is defined based on the performance function. However, coupling performance and safety is often not desirable in robotics. For example, high-gain controllers might achieve low average tracking error (performance), but can overshoot and violate input constraints. In this paper, we present a generalized algorithm that allows for multiple safety constraints separate from the objective. Given an initial set of safe parameters, the algorithm maximizes performance but only evaluates parameters that satisfy safety for all constraints with high probability. To this end, it carefully explores the parameter space by exploiting regularity assumptions in terms of a Gaussian process prior. Moreover, we show how context variables can be used to safely transfer knowledge to new situations and tasks. We provide a theoretical analysis and demonstrate that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters in experiments on a quadrotor vehicle.}
}

[DOI] Safe and robust learning control with Gaussian processes
F. Berkenkamp and A. P. Schoellig
in Proc. of the European Control Conference (ECC), 2015, p. 2501–2506.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Download Slides]

This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle.

@INPROCEEDINGS{berkenkamp-ecc15,
author = {Felix Berkenkamp and Angela P. Schoellig},
title = {Safe and robust learning control with {G}aussian processes},
booktitle = {{Proc. of the European Control Conference (ECC)}},
pages = {2501--2506},
year = {2015},
doi = {10.1109/ECC.2015.7330913},
urlvideo={https://youtu.be/YqhLnCm0KXY?list=PLC12E387419CEAFF2},
urlslides={../../wp-content/papercite-data/slides/berkenkamp-ecc15-slides.pdf},
abstract = {This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle.}
}

[DOI] Conservative to confident: treating uncertainty robustly within learning-based control
C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2015, p. 421–427.
[View BibTeX] [View Abstract] [Download PDF]

Robust control maintains stability and performance for a fixed amount of model uncertainty but can be conservative since the model is not updated online. Learning- based control, on the other hand, uses data to improve the model over time but is not typically guaranteed to be robust throughout the process. This paper proposes a novel combination of both ideas: a robust Min-Max Learning-Based Nonlinear Model Predictive Control (MM-LB-NMPC) algorithm. Based on an existing LB-NMPC algorithm, we present an efficient and robust extension, altering the NMPC performance objective to optimize for the worst-case scenario. The algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) based on experience collected during previous trials as a function of system state, input, and other relevant variables. Nominal state sequences are predicted using an Unscented Transform and worst-case scenarios are defined as sequences bounding the 3σ confidence region. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results from testing on a 50 kg skid-steered robot executing a path-tracking task. The results show reductions in maximum lateral and heading path-tracking errors by up to 30% and a clear transition from robust control when the model uncertainty is high to optimal control when model uncertainty is reduced.

@INPROCEEDINGS{ostafew-icra15,
author = {Chris J. Ostafew and Angela P. Schoellig and Timothy D. Barfoot},
title = {Conservative to confident: treating uncertainty robustly within learning-based control},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
pages = {421--427},
year = {2015},
doi = {10.1109/ICRA.2015.7139033},
note = {},
abstract = {Robust control maintains stability and performance for a fixed amount of model uncertainty but can be conservative since the model is not updated online. Learning- based control, on the other hand, uses data to improve the model over time but is not typically guaranteed to be robust throughout the process. This paper proposes a novel combination of both ideas: a robust Min-Max Learning-Based Nonlinear Model Predictive Control (MM-LB-NMPC) algorithm. Based on an existing LB-NMPC algorithm, we present an efficient and robust extension, altering the NMPC performance objective to optimize for the worst-case scenario. The algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) based on experience collected during previous trials as a function of system state, input, and other relevant variables. Nominal state sequences are predicted using an Unscented Transform and worst-case scenarios are defined as sequences bounding the 3σ confidence region. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results from testing on a 50 kg skid-steered robot executing a path-tracking task. The results show reductions in maximum lateral and heading path-tracking errors by up to 30% and a clear transition from robust control when the model uncertainty is high to optimal control when model uncertainty is reduced.}
}

University of Toronto Institute for Aerospace Studies