Learning Control Theory and Foundations

Learning Control Theory and Foundations

Learning algorithms hold great promise for improving a robot’s performance whenever a-priori models are not sufficiently accurate. We have developed learning controllers of different complexity ranging from controllers that improve the execution of a specific task by iteratively updating the reference input, to task-independent schemes that update the underlying robot model whenever new data becomes available. However, all learning controllers have the following characteristics:

  • they combine a-priori model information with experimental data,
  • they make no major, a-priori assumptions about the unknown effects to be learned, and
  • they have been tested extensively on state-of-the-art robotic platforms.
Our  algorithms combine fundamental concepts from control theory (e.g., optimal filtering and model predictive control) and machine learning (e.g., Gaussian processes), and computational tools from optimization (e.g., convex problem solvers). The result are fast-converging, computationally efficient, and practical learning algorithms and first-of-its-kind robot demonstrations. We  demonstrated both (i) full 3D motion learning on quadrotor vehicles and (ii) outdoor learning experiments that increased the tracking accuracy and speed of a ground robot navigating based on vision only. See also our robot racing page.

 

Related Publications

Model Learning with Gaussian Processes

[DOI] Learning-based nonlinear model predictive control to improve vision-based mobile robot path tracking
C. J. Ostafew, J. Collier, A. P. Schoellig, and T. D. Barfoot
Journal of Field Robotics, vol. 33, iss. 1, pp. 133-152, 2015.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [View 2nd Video] [View 3rd Video] [View 4th Video]
This paper presents a Learning-based Nonlinear Model Predictive Control (LB-NMPC) algorithm to achieve high-performance path tracking in challenging off-road terrain through learning. The LB-NMPC algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) as a function of system state, input, and other relevant variables. The GP is updated based on experience collected during previous trials. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results including over 3.0 km of travel by three significantly different robot platforms with masses ranging from 50 kg to 600 kg and at speeds ranging from 0.35 m/s to 1.2 m/s. Planned speeds are generated by a novel experience-based speed scheduler that balances overall travel time, path-tracking errors, and localization reliability. The results show that the controller can start from a generic a priori vehicle model and subsequently learn to reduce vehicle- and trajectory-specific path-tracking errors based on experience.

@ARTICLE{ostafew-jfr15,
author = {Chris J. Ostafew and Jack Collier and Angela P. Schoellig and Timothy D. Barfoot},
title = {Learning-based nonlinear model predictive control to improve vision-based mobile robot path tracking},
year = {2015},
journal = {{Journal of Field Robotics}},
volume = {33},
number = {1},
pages = {133-152},
doi = {10.1002/rob.21587},
urlvideo={https://youtu.be/lxm-2A6yOY0?list=PLC12E387419CEAFF2},
urlvideo2={https://youtu.be/M9xhkHCzpMo?list=PL0F1AD87C0266A961},
urlvideo3={http://youtu.be/MwVElAn95-M?list=PLC0E5EB919968E507},
urlvideo4={http://youtu.be/Pu3_F6k6Fa4?list=PLC0E5EB919968E507},
abstract = {This paper presents a Learning-based Nonlinear Model Predictive Control (LB-NMPC) algorithm to achieve high-performance path tracking in challenging off-road terrain through learning. The LB-NMPC algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) as a function of system state, input, and other relevant variables. The GP is updated based on experience collected during previous trials. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results including over 3.0 km of travel by three significantly different robot platforms with masses ranging from 50 kg to 600 kg and at speeds ranging from 0.35 m/s to 1.2 m/s. Planned speeds are generated by a novel experience-based speed scheduler that balances overall travel time, path-tracking errors, and localization reliability. The results show that the controller can start from a generic a priori vehicle model and subsequently learn to reduce vehicle- and trajectory-specific path-tracking errors based on experience.}
}

[DOI] Learning-based nonlinear model predictive control to improve vision-based mobile robot path-tracking in challenging outdoor environments
C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 4029-4036.
[View BibTeX] [View Abstract] [Download PDF] [View Video]

This paper presents a Learning-based Nonlinear Model Predictive Control (LB-NMPC) algorithm for an autonomous mobile robot to reduce path-tracking errors over repeated traverses along a reference path. The LB-NMPC algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) based on experience collected during previous traversals as a function of system state, input and other relevant variables. Modelling the disturbance as a GP enables interpolation and extrapolation of learned disturbances, a key feature of this algorithm. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results including over 1.8 km of travel by a four-wheeled, 50 kg robot travelling through challenging terrain (including steep, uneven hills) and by a six-wheeled, 160 kg robot learning disturbances caused by unmodelled dynamics at speeds ranging from 0.35 m/s to 1.0 m/s. The speed is scheduled to balance trial time, path-tracking errors, and localization reliability based on previous experience. The results show that the system can start from a generic a priori vehicle model and subsequently learn to reduce vehicle- and trajectory-specific path-tracking errors based on experience.

@INPROCEEDINGS{ostafew-icra14,
author = {Chris J. Ostafew and Angela P. Schoellig and Timothy D. Barfoot},
title = {Learning-based nonlinear model predictive control to improve vision-based mobile robot path-tracking in challenging outdoor environments},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
pages = {4029-4036},
year = {2014},
doi = {10.1109/ICRA.2014.6907444},
urlvideo = {https://youtu.be/MwVElAn95-M?list=PLC12E387419CEAFF2},
abstract = {This paper presents a Learning-based Nonlinear Model Predictive Control (LB-NMPC) algorithm for an autonomous mobile robot to reduce path-tracking errors over repeated traverses along a reference path. The LB-NMPC algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) based on experience collected during previous traversals as a function of system state, input and other relevant variables. Modelling the disturbance as a GP enables interpolation and extrapolation of learned disturbances, a key feature of this algorithm. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results including over 1.8 km of travel by a four-wheeled, 50 kg robot travelling through challenging terrain (including steep, uneven hills) and by a six-wheeled, 160 kg robot learning disturbances caused by unmodelled dynamics at speeds ranging from 0.35 m/s to 1.0 m/s. The speed is scheduled to balance trial time, path-tracking errors, and localization reliability based on previous experience. The results show that the system can start from a generic a priori vehicle model and subsequently learn to reduce vehicle- and trajectory-specific path-tracking errors based on experience.}
}

Safe and Robust Learning Control

[DOI] Conservative to confident: treating uncertainty robustly within learning-based control
C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 421-427.
[View BibTeX] [View Abstract] [Download PDF]
Robust control maintains stability and performance for a fixed amount of model uncertainty but can be conservative since the model is not updated online. Learning- based control, on the other hand, uses data to improve the model over time but is not typically guaranteed to be robust throughout the process. This paper proposes a novel combination of both ideas: a robust Min-Max Learning-Based Nonlinear Model Predictive Control (MM-LB-NMPC) algorithm. Based on an existing LB-NMPC algorithm, we present an efficient and robust extension, altering the NMPC performance objective to optimize for the worst-case scenario. The algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) based on experience collected during previous trials as a function of system state, input, and other relevant variables. Nominal state sequences are predicted using an Unscented Transform and worst-case scenarios are defined as sequences bounding the 3σ confidence region. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results from testing on a 50 kg skid-steered robot executing a path-tracking task. The results show reductions in maximum lateral and heading path-tracking errors by up to 30% and a clear transition from robust control when the model uncertainty is high to optimal control when model uncertainty is reduced.

@INPROCEEDINGS{ostafew-icra15,
author = {Chris J. Ostafew and Angela P. Schoellig and Timothy D. Barfoot},
title = {Conservative to confident: treating uncertainty robustly within learning-based control},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
pages = {421--427},
year = {2015},
doi = {10.1109/ICRA.2015.7139033},
note = {},
abstract = {Robust control maintains stability and performance for a fixed amount of model uncertainty but can be conservative since the model is not updated online. Learning- based control, on the other hand, uses data to improve the model over time but is not typically guaranteed to be robust throughout the process. This paper proposes a novel combination of both ideas: a robust Min-Max Learning-Based Nonlinear Model Predictive Control (MM-LB-NMPC) algorithm. Based on an existing LB-NMPC algorithm, we present an efficient and robust extension, altering the NMPC performance objective to optimize for the worst-case scenario. The algorithm uses a simple a priori vehicle model and a learned disturbance model. Disturbances are modelled as a Gaussian Process (GP) based on experience collected during previous trials as a function of system state, input, and other relevant variables. Nominal state sequences are predicted using an Unscented Transform and worst-case scenarios are defined as sequences bounding the 3σ confidence region. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied environments. The paper presents experimental results from testing on a 50 kg skid-steered robot executing a path-tracking task. The results show reductions in maximum lateral and heading path-tracking errors by up to 30% and a clear transition from robust control when the model uncertainty is high to optimal control when model uncertainty is reduced.}
}

[DOI] Safe and robust learning control with Gaussian processes
F. Berkenkamp and A. P. Schoellig
in Proc. of the European Control Conference (ECC), 2015, pp. 2501-2506.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Download Slides]

This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle.

@INPROCEEDINGS{berkenkamp-ecc15,
author = {Felix Berkenkamp and Angela P. Schoellig},
title = {Safe and robust learning control with {G}aussian processes},
booktitle = {{Proc. of the European Control Conference (ECC)}},
pages = {2501--2506},
year = {2015},
doi = {10.1109/ECC.2015.7330913},
urlvideo={https://youtu.be/YqhLnCm0KXY?list=PLC12E387419CEAFF2},
urlslides={../../wp-content/papercite-data/slides/berkenkamp-ecc15-slides.pdf},
abstract = {This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle.}
}

Learning of Feed-Forward Corrections

[DOI] Design of norm-optimal iterative learning controllers: the effect of an iteration-domain Kalman filter for disturbance estimation
N. Degen and A. P. Schoellig
in Proc. of the IEEE Conference on Decision and Control (CDC), 2014, pp. 3590-3596.
[View BibTeX] [View Abstract] [Download PDF] [Download Slides]
Iterative learning control (ILC) has proven to be an effective method for improving the performance of repetitive control tasks. This paper revisits two optimization-based ILC algorithms: (i) the widely used quadratic-criterion ILC law (QILC) and (ii) an estimation-based ILC law using an iteration-domain Kalman filter (K-ILC). The goal of this paper is to analytically compare both algorithms and to highlight the advantages of the Kalman-filter-enhanced algorithm. We first show that for an iteration-constant estimation gain and an appropriate choice of learning parameters both algorithms are identical. We then show that the estimation-enhanced algorithm with its iteration-varying optimal Kalman gains can achieve both fast initial convergence and good noise rejection by (optimally) adapting the learning update rule over the course of an experiment. We conclude that the clear separation of disturbance estimation and input update of the K-ILC algorithm provides an intuitive architecture to design learning schemes that achieve both low noise sensitivity and fast convergence. To benchmark the algorithms we use a simulation of a single-input, single-output mass-spring-damper system.

@INPROCEEDINGS{degen-cdc14,
author = {Nicolas Degen and Angela P. Schoellig},
title = {Design of norm-optimal iterative learning controllers: the effect of an iteration-domain {K}alman filter for disturbance estimation},
booktitle = {{Proc. of the IEEE Conference on Decision and Control (CDC)}},
pages = {3590-3596},
year = {2014},
doi = {10.1109/CDC.2014.7039947},
urlslides = {../../wp-content/papercite-data/slides/degen-cdc14-slides.pdf},
abstract = {Iterative learning control (ILC) has proven to be an effective method for improving the performance of repetitive control tasks. This paper revisits two optimization-based ILC algorithms: (i) the widely used quadratic-criterion ILC law (QILC) and (ii) an estimation-based ILC law using an iteration-domain Kalman filter (K-ILC). The goal of this paper is to analytically compare both algorithms and to highlight the advantages of the Kalman-filter-enhanced algorithm. We first show that for an iteration-constant estimation gain and an appropriate choice of learning parameters both algorithms are identical. We then show that the estimation-enhanced algorithm with its iteration-varying optimal Kalman gains can achieve both fast initial convergence and good noise rejection by (optimally) adapting the learning update rule over the course of an experiment. We conclude that the clear separation of disturbance estimation and input update of the K-ILC algorithm provides an intuitive architecture to design learning schemes that achieve both low noise sensitivity and fast convergence. To benchmark the algorithms we use a simulation of a single-input, single-output mass-spring-damper system.}
}

[DOI] Speed daemon: experience-based mobile robot speed scheduling
C. J. Ostafew, A. P. Schoellig, T. D. Barfoot, and J. Collier
in Proc. of the International Conference on Computer and Robot Vision (CRV), 2014, pp. 56-62. Best Robotics Paper Award.
[View BibTeX] [View Abstract] [Download PDF] [View Video]

A time-optimal speed schedule results in a mobile robot driving along a planned path at or near the limits of the robot’s capability. However, deriving models to predict the effect of increased speed can be very difficult. In this paper, we present a speed scheduler that uses previous experience, instead of complex models, to generate time-optimal speed schedules. The algorithm is designed for a vision-based, path-repeating mobile robot and uses experience to ensure reliable localization, low path-tracking errors, and realizable control inputs while maximizing the speed along the path. To our knowledge, this is the first speed scheduler to incorporate experience from previous path traversals in order to address system constraints. The proposed speed scheduler was tested in over 4 km of path traversals in outdoor terrain using a large Ackermann-steered robot travelling between 0.5 m/s and 2.0 m/s. The approach to speed scheduling is shown to generate fast speed schedules while remaining within the limits of the robot’s capability.

@INPROCEEDINGS{ostafew-crv14,
author = {Chris J. Ostafew and Angela P. Schoellig and Timothy D. Barfoot and J. Collier},
title = {Speed daemon: experience-based mobile robot speed scheduling},
booktitle = {{Proc. of the International Conference on Computer and Robot Vision (CRV)}},
pages = {56-62},
year = {2014},
doi = {10.1109/CRV.2014.16},
urlvideo = {https://youtu.be/Pu3_F6k6Fa4?list=PLC12E387419CEAFF2},
abstract = {A time-optimal speed schedule results in a mobile robot driving along a planned path at or near the limits of the robot's capability. However, deriving models to predict the effect of increased speed can be very difficult. In this paper, we present a speed scheduler that uses previous experience, instead of complex models, to generate time-optimal speed schedules. The algorithm is designed for a vision-based, path-repeating mobile robot and uses experience to ensure reliable localization, low path-tracking errors, and realizable control inputs while maximizing the speed along the path. To our knowledge, this is the first speed scheduler to incorporate experience from previous path traversals in order to address system constraints. The proposed speed scheduler was tested in over 4 km of path traversals in outdoor terrain using a large Ackermann-steered robot travelling between 0.5 m/s and 2.0 m/s. The approach to speed scheduling is shown to generate fast speed schedules while remaining within the limits of the robot's capability.},
note = {Best Robotics Paper Award}
}

[DOI] Improving tracking performance by learning from past data
A. P. Schoellig
PhD Thesis, Diss. ETH No. 20593, Institute for Dynamic Systems and Control, ETH Zurich, Switzerland, 2013. Awards: ETH Medal, Dimitris N. Chorafas Foundation Prize.
[View BibTeX] [Download Abstract] [Download PDF] [View Video] [View 2nd Video] [Download Slides]

@PHDTHESIS{schoellig-eth13,
author = {Angela P. Schoellig},
title = {Improving tracking performance by learning from past data},
school = {Diss. ETH No. 20593, Institute for Dynamic Systems and Control, ETH Zurich},
doi = {10.3929/ethz-a-009758916},
year = {2013},
address = {Switzerland},
urlabstract = {../../wp-content/papercite-data/pdf/schoellig-eth13-abstract.pdf},
urlslides = {../../wp-content/papercite-data/slides/schoellig-eth13-slides.pdf},
urlvideo = {https://youtu.be/zHTCsSkmADo?list=PLC12E387419CEAFF2},
urlvideo2 = {https://youtu.be/7r281vgfotg?list=PLD6AAACCBFFE64AC5},
note = {Awards: ETH Medal, Dimitris N. Chorafas Foundation Prize}
}

[DOI] Visual teach and repeat, repeat, repeat: iterative learning control to improve mobile robot path tracking in challenging outdoor environments
C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot
in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013, pp. 176-181.
[View BibTeX] [View Abstract] [Download PDF] [View Video]

This paper presents a path-repeating, mobile robot controller that combines a feedforward, proportional Iterative Learning Control (ILC) algorithm with a feedback-linearized path-tracking controller to reduce path-tracking errors over repeated traverses along a reference path. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied, extreme environments. The paper presents experimental results including over 600 m of travel by a four-wheeled, 50 kg robot travelling through challenging terrain including steep hills and sandy turns and by a six-wheeled, 160 kg robot at gradually-increased speeds up to three times faster than the nominal, safe speed. In the absence of a global localization system, ILC is demonstrated to reduce path-tracking errors caused by unmodelled robot dynamics and terrain challenges.

@INPROCEEDINGS{ostafew-iros13,
author = {Chris J. Ostafew and Angela P. Schoellig and Timothy D. Barfoot},
title = {Visual teach and repeat, repeat, repeat: Iterative learning control to improve mobile robot path tracking in challenging outdoor environments},
booktitle = {{Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}},
pages = {176-181},
year = {2013},
doi = {10.1109/IROS.2013.6696350},
urlvideo = {https://youtu.be/08_d1HSPADA?list=PLC12E387419CEAFF2},
abstract = {This paper presents a path-repeating, mobile robot controller that combines a feedforward, proportional Iterative Learning Control (ILC) algorithm with a feedback-linearized path-tracking controller to reduce path-tracking errors over repeated traverses along a reference path. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, GPS-denied, extreme environments. The paper presents experimental results including over 600 m of travel by a four-wheeled, 50 kg robot travelling through challenging terrain including steep hills and sandy turns and by a six-wheeled, 160 kg robot at gradually-increased speeds up to three times faster than the nominal, safe speed. In the absence of a global localization system, ILC is demonstrated to reduce path-tracking errors caused by unmodelled robot dynamics and terrain challenges.}
}

[DOI] Feed-forward parameter identification for precise periodic quadrocopter motions
A. P. Schoellig, C. Wiltsche, and R. D’Andrea
in Proc. of the American Control Conference (ACC), 2012, pp. 4313-4318.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Download Slides]

This paper presents an approach for precisely tracking periodic trajectories with a quadrocopter. In order to improve temporal and spatial tracking performance, we propose a feed-forward strategy that adapts the motion parameters sent to the vehicle controller. The motion parameters are either adjusted on the fly or, in order to avoid initial transients, identified prior to the flight performance. We outline an identification scheme that tunes parameters for a large class of periodic motions, and requires only a small number of identification experiments prior to flight. This reduced identification is based on analysis and experiments showing that the quadrocopter’s closed-loop dynamics can be approximated by three directionally decoupled linear systems. We show the effectiveness of this approach by performing a sequence of periodic motions on real quadrocopters using the tuned parameters obtained by the reduced identification.

@INPROCEEDINGS{schoellig-acc12,
author = {Angela P. Schoellig and Clemens Wiltsche and Raffaello D'Andrea},
title = {Feed-forward parameter identification for precise periodic quadrocopter motions},
booktitle = {{Proc. of the American Control Conference (ACC)}},
pages = {4313-4318},
year = {2012},
doi = {10.1109/ACC.2012.6315248},
urlvideo = {http://tiny.cc/MusicInMotion},
urlslides = {../../wp-content/papercite-data/slides/schoellig-acc12-slides.pdf},
abstract = {This paper presents an approach for precisely tracking periodic trajectories with a quadrocopter. In order to improve temporal and spatial tracking performance, we propose a feed-forward strategy that adapts the motion parameters sent to the vehicle controller. The motion parameters are either adjusted on the fly or, in order to avoid initial transients, identified prior to the flight performance. We outline an identification scheme that tunes parameters for a large class of periodic motions, and requires only a small number of identification experiments prior to flight. This reduced identification is based on analysis and experiments showing that the quadrocopter's closed-loop dynamics can be approximated by three directionally decoupled linear systems. We show the effectiveness of this approach by performing a sequence of periodic motions on real quadrocopters using the tuned parameters obtained by the reduced identification.}
}

[DOI] Iterative learning of feed-forward corrections for high-performance tracking
F. L. Mueller, A. P. Schoellig, and R. D’Andrea
in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012, pp. 3276-3281.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Download Slides]

We revisit a recently developed iterative learning algorithm that enables systems to learn from a repeated operation with the goal of achieving high tracking performance of a given trajectory. The learning scheme is based on a coarse dynamics model of the system and uses past measurements to iteratively adapt the feed-forward input signal to the system. The novelty of this work is an identification routine that uses a numerical simulation of the system dynamics to extract the required model information. This allows the learning algorithm to be applied to any dynamic system for which a dynamics simulation is available (including systems with underlying feedback loops). The proposed learning algorithm is applied to a quadrocopter system that is guided by a trajectory-following controller. With the identification routine, we are able to extend our previous learning results to three-dimensional quadrocopter motions and achieve significantly higher tracking accuracy due to the underlying feedback control, which accounts for non-repetitive noise.

@INPROCEEDINGS{mueller-iros12,
author = {Fabian L. Mueller and Angela P. Schoellig and Raffaello D'Andrea},
title = {Iterative learning of feed-forward corrections for high-performance tracking},
booktitle = {{Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}},
pages = {3276-3281},
year = {2012},
doi = {10.1109/IROS.2012.6385647},
urlvideo = {https://youtu.be/zHTCsSkmADo?list=PLC12E387419CEAFF2},
urlslides = {../../wp-content/papercite-data/slides/mueller-iros12-slides.pdf},
abstract = {We revisit a recently developed iterative learning algorithm that enables systems to learn from a repeated operation with the goal of achieving high tracking performance of a given trajectory. The learning scheme is based on a coarse dynamics model of the system and uses past measurements to iteratively adapt the feed-forward input signal to the system. The novelty of this work is an identification routine that uses a numerical simulation of the system dynamics to extract the required model information. This allows the learning algorithm to be applied to any dynamic system for which a dynamics simulation is available (including systems with underlying feedback loops). The proposed learning algorithm is applied to a quadrocopter system that is guided by a trajectory-following controller. With the identification routine, we are able to extend our previous learning results to three-dimensional quadrocopter motions and achieve significantly higher tracking accuracy due to the underlying feedback control, which accounts for non-repetitive noise.}
}

[DOI] Optimization-based iterative learning for precise quadrocopter trajectory tracking
A. P. Schoellig, F. L. Mueller, and R. D’Andrea
Autonomous Robots, vol. 33, iss. 1-2, pp. 103-127, 2012.
[View BibTeX] [View Abstract] [Download PDF] [View Video]

Current control systems regulate the behavior of dynamic systems by reacting to noise and unexpected disturbances as they occur. To improve the performance of such control systems, experience from iterative executions can be used to anticipate recurring disturbances and proactively compensate for them. This paper presents an algorithm that exploits data from previous repetitions in order to learn to precisely follow a predefined trajectory. We adapt the feed-forward input signal to the system with the goal of achieving high tracking performance – even under the presence of model errors and other recurring disturbances. The approach is based on a dynamics model that captures the essential features of the system and that explicitly takes system input and state constraints into account. We combine traditional optimal filtering methods with state-of-the-art optimization techniques in order to obtain an effective and computationally efficient learning strategy that updates the feed-forward input signal according to a customizable learning objective. It is possible to define a termination condition that stops an execution early if the deviation from the nominal trajectory exceeds a given bound. This allows for a safe learning that gradually extends the time horizon of the trajectory. We developed a framework for generating arbitrary flight trajectories and for applying the algorithm to highly maneuverable autonomous quadrotor vehicles in the ETH Flying Machine Arena testbed. Experimental results are discussed for selected trajectories and different learning algorithm parameters.

@ARTICLE{schoellig-auro12,
author = {Angela P. Schoellig and Fabian L. Mueller and Raffaello D'Andrea},
title = {Optimization-based iterative learning for precise quadrocopter trajectory tracking},
journal = {{Autonomous Robots}},
volume = {33},
number = {1-2},
pages = {103-127},
year = {2012},
doi = {10.1007/s10514-012-9283-2},
urlvideo={http://youtu.be/goVuP5TJIUU?list=PLC12E387419CEAFF2},
abstract = {Current control systems regulate the behavior of dynamic systems by reacting to noise and unexpected disturbances as they occur. To improve the performance of such control systems, experience from iterative executions can be used to anticipate recurring disturbances and proactively compensate for them. This paper presents an algorithm that exploits data from previous repetitions in order to learn to precisely follow a predefined trajectory. We adapt the feed-forward input signal to the system with the goal of achieving high tracking performance - even under the presence of model errors and other recurring disturbances. The approach is based on a dynamics model that captures the essential features of the system and that explicitly takes system input and state constraints into account. We combine traditional optimal filtering methods with state-of-the-art optimization techniques in order to obtain an effective and computationally efficient learning strategy that updates the feed-forward input signal according to a customizable learning objective. It is possible to define a termination condition that stops an execution early if the deviation from the nominal trajectory exceeds a given bound. This allows for a safe learning that gradually extends the time horizon of the trajectory. We developed a framework for generating arbitrary flight trajectories and for applying the algorithm to highly maneuverable autonomous quadrotor vehicles in the ETH Flying Machine Arena testbed. Experimental results are discussed for selected trajectories and different learning algorithm parameters.}
}

Optimization-based iterative learning control for trajectory tracking
A. P. Schoellig and R. D’Andrea
in Proc. of the European Control Conference (ECC), 2009, pp. 1505-1510.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Download Slides] [More Information]

In this paper, an optimization-based iterative learning control approach is presented. Given a desired trajectory to be followed, the proposed learning algorithm improves the system performance from trial to trial by exploiting the experience gained from previous repetitions. Taking advantage of the a-priori knowledge about the systems dominating dynamics, a data-based update rule is derived which adapts the feedforward input signal after each trial. By combining traditional model-based optimal filtering methods with state-of-the-art optimization techniques such as convex programming, an effective and computationally highly efficient learning strategy is obtained. Moreover, the derived formalism allows for the direct treatment of input and state constraints. Different (nonlinear) performance objectives can be specified defining the overall learning behavior. Finally, the proposed algorithm is successfully applied to the benchmark problem of swinging up a pendulum using open-loop control only.

@INPROCEEDINGS{schoellig-ecc09,
author = {Angela P. Schoellig and Raffaello D'Andrea},
title = {Optimization-based iterative learning control for trajectory tracking},
booktitle = {{Proc. of the European Control Conference (ECC)}},
pages = {1505-1510},
year = {2009},
urlslides = {../../wp-content/papercite-data/slides/schoellig-ecc09-slides.pdf},
urllink = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7074619},
urlvideo = {https://youtu.be/W2gCn6aAwz4?list=PLC12E387419CEAFF2},
abstract = {In this paper, an optimization-based iterative learning control approach is presented. Given a desired trajectory to be followed, the proposed learning algorithm improves the system performance from trial to trial by exploiting the experience gained from previous repetitions. Taking advantage of the a-priori knowledge about the systems dominating dynamics, a data-based update rule is derived which adapts the feedforward input signal after each trial. By combining traditional model-based optimal filtering methods with state-of-the-art optimization techniques such as convex programming, an effective and computationally highly efficient learning strategy is obtained. Moreover, the derived formalism allows for the direct treatment of input and state constraints. Different (nonlinear) performance objectives can be specified defining the overall learning behavior. Finally, the proposed algorithm is successfully applied to the benchmark problem of swinging up a pendulum using open-loop control only.}
}

Learning through experience — Optimizing performance by repetition
A. P. Schoellig and R. D’Andrea
Abstract and Poster, in Proc. of the Robotics Challenges for Machine Learning Workshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2008.
[View BibTeX] [View Abstract] [Download PDF] [View Video] [Download Slides] [More Information]

The goal of our research is to develop a strategy which enables a system, executing the same task multiple times, to use the knowledge of the previous trials to learn more about its own dynamics and enhance its future performance. Our approach, which falls in the field of iterative learning control, combines methods from both areas, traditional model-based estimation and control and purely data-based learning.

@MISC{schoellig-iros08,
author = {Angela P. Schoellig and Raffaello D'Andrea},
title = {Learning through experience -- {O}ptimizing performance by repetition},
howpublished = {Abstract and Poster, in Proc. of the Robotics Challenges for Machine Learning Workshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year = {2008},
urlvideo = {https://youtu.be/W2gCn6aAwz4?list=PLC12E387419CEAFF2},
urlslides = {../../wp-content/papercite-data/slides/schoellig-iros08-slides.pdf},
urllink = {http://www.learning-robots.de/pmwiki.php/TC/IROS2008},
abstract = {The goal of our research is to develop a strategy which enables a system, executing the same task multiple times, to use the knowledge of the previous trials to learn more about its own dynamics and enhance its future performance. Our approach, which falls in the field of iterative learning control, combines methods from both areas, traditional model-based estimation and control and purely data-based learning.},
}

Gradient-Based Learning

[DOI] A simple learning strategy for high-speed quadrocopter multi-flips
S. Lupashin, A. P. Schoellig, M. Sherback, and R. D’Andrea
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2010, pp. 1642-1648.
[View BibTeX] [View Abstract] [Download PDF] [View Video]
We describe a simple and intuitive policy gradient method for improving parametrized quadrocopter multi-flips by combining iterative experiments with information from a first-principles model. We start by formulating an N-flip maneuver as a five-step primitive with five adjustable parameters. Optimization using a low-order first-principles 2D vertical plane model of the quadrocopter yields an initial set of parameters and a corrective matrix. The maneuver is then repeatedly performed with the vehicle. At each iteration the state error at the end of the primitive is used to update the maneuver parameters via a gradient adjustment. The method is demonstrated at the ETH Zurich Flying Machine Arena testbed on quadrotor helicopters performing and improving on flips, double flips and triple flips.

@INPROCEEDINGS{lupashin-icra10,
author = {Sergei Lupashin and Angela P. Schoellig and Michael Sherback and Raffaello D'Andrea},
title = {A simple learning strategy for high-speed quadrocopter multi-flips},
booktitle = {{Proc. of the IEEE International Conference on Robotics and Automation (ICRA)}},
pages = {1642-1648},
year = {2010},
doi = {10.1109/ROBOT.2010.5509452},
urlvideo = {https://youtu.be/bWExDW9J9sA?list=PLC12E387419CEAFF2},
abstract = {We describe a simple and intuitive policy gradient method for improving parametrized quadrocopter multi-flips by combining iterative experiments with information from a first-principles model. We start by formulating an N-flip maneuver as a five-step primitive with five adjustable parameters. Optimization using a low-order first-principles 2D vertical plane model of the quadrocopter yields an initial set of parameters and a corrective matrix. The maneuver is then repeatedly performed with the vehicle. At each iteration the state error at the end of the primitive is used to update the maneuver parameters via a gradient adjustment. The method is demonstrated at the ETH Zurich Flying Machine Arena testbed on quadrotor helicopters performing and improving on flips, double flips and triple flips.}
}

University of Toronto Institute for Aerospace Studies