Reinforcement Learning-Based Model Predictive Controller for Mobile Robots in Dynamic Environments with Safety Constraints
##plugins.themes.bootstrap3.article.main##
Abstract
This paper proposes a Reinforcement Learning-based Model Predictive Controller (RL-MPC) for mobile robots operating in dynamic environments with stringent safety constraints. The key challenges addressed include model and perception uncertainty, moving obstacles, and real-time computational requirements. The proposed framework combines three key components: first, a learned dynamics model with uncertainty estimation to enhance the robustness of the system in uncertain environments; second, a risk-aware MPC that uses chance constraints and Conditional Value-at-Risk (CVaR) to enforce safety by ensuring that violation probabilities remain below a predefined threshold; and third, a Control Barrier Function (CBF) that acts as a safety layer, projecting actions to stay within a predefined safe set. The policy learning, utilizing Proximal Policy Optimization (PPO) or Soft Actor-Critic (SAC), is integrated with reward shaping and safety shielding to ensure that the robot prioritizes safety while achieving performance. Additionally, a sim-to-real strategy with domain randomization is employed to improve robustness when transitioning from simulation to real-world applications. The framework is evaluated in three different scenarios: solid static obstacles, moving obstacles, and multi-agent traffic. The results demonstrate that RL-MPC reduces the safety violation rate to ≤2%, a significant improvement compared to 2.8–12.3% in the baseline. Moreover, RL-MPC increases the minimum distance between the robot and obstacles to approximately 0.2 meters in dynamic scenarios and achieves a success rate of 95–99%, without significantly increasing the path length or energy consumption. Although the computational overhead increases by approximately 3–5 ms compared to classical MPC, the system still meets the 20 ms per cycle requirement, making it suitable for real-time applications. The ablation study confirms the dominant role of the CBF and risk-based constraints in preventing near-collisions, highlighting their crucial contribution to the system's safety. Overall, the RL-MPC framework provides a favorable trade-off between safety, efficiency, and implementation feasibility, offering a promising solution for online autonomous operations in dynamic environments.
##plugins.themes.bootstrap3.article.details##
[2] A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control Barrier Function Based Quadratic Programs with Application to Adaptive Cruise Control,” in Proc. IEEE CDC, 2014, pp. 6271–6278, doi: 10.1109/CDC.2014.7040372.
[3] D. Q. Mayne, “Tube-based robust nonlinear model predictive control,” Int. J. Robust Nonlinear Control, vol. 21, no. 11, pp. 1341–1353, 2011, doi: 10.1002/rnc.1758.
[4] A. T. Schwarm and M. Morari, “Chance-Constrained Model Predictive Control,” AIChE J., vol. 45, no. 8, pp. 1743–1752, 1999, doi: 10.1002/aic.690450811.
[5] M. Cannon, B. Kouvaritakis, and D. Ng, “Probabilistic tubes in linear stochastic model predictive control,” Syst. Control Lett., vol. 58, no. 10–11, pp. 747–753, 2009, doi: 10.1016/j.sysconle.2009.08.004.
[6] T. Koller, F. Berkenkamp, M. Turchetta, J. Boedecker, and A. Krause, “Learning-based Model Predictive Control for Safe Exploration,” in Proc. IEEE CDC, 2018, pp. 6059–6066, doi: 10.1109/CDC.2018.8619572. (arXiv:1906.12189).
[7] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe Model-Based Reinforcement Learning with Stability Guarantees,” in Proc. NeurIPS, 2017. (arXiv:1705.08551).
[8] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: an operator splitting solver for quadratic programs,” Math. Program. Comput., vol. 12, no. 4, pp. 637–672, 2020, doi: 10.1007/s12532-020-00179-2.
[9] H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl, “qpOASES: A Parametric Active-Set Algorithm for Quadratic Programming,” Math. Program. Comput., vol. 6, no. 4, pp. 327–363, 2014, doi: 10.1007/s12532-014-0071-1.
[10] J. Tobin et al., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in Proc. IEEE/RSJ IROS, 2017, pp. 23–30, doi: 10.1109/IROS.2017.8202133.
[11] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL with a Stochastic Actor,” in Proc. PMLR (ICML), 2018, pp. 1861–1870. (No DOI; PMLR v80).
[12] T. Haarnoja et al., “Soft Actor-Critic Algorithms and Applications,” 2018. (arXiv:1812.05905).
[13] W. Xiao, C. V. Paredes, N. Ozay, and A. D. Ames, “Control Barrier Functions for Systems with High Relative Degree,” in Proc. IEEE CDC, 2019, pp. 474–481, doi: 10.1109/CDC40024.2019.9029455.
[14] G. Williams et al., “Information Theoretic MPC for Model-Based Reinforcement Learning,” in Proc. IEEE ICRA, 2017, pp. 1714–1721, doi: 10.1109/ICRA.2017.7989202.
[15] L. Hewing, J. Kabzan, and M. N. Zeilinger, “Cautious Model Predictive Control Using Gaussian Process Regression,” 2017. (arXiv:1705.10702).
[16] A. D. Bonzanini, J. H. P. Pádua, L. Fagiano, and M. Farina, “Perception-aware chance-constrained model predictive control for constrained robots,” Automatica (in press, 2024). (ScienceDirect record).
[17] A. Navsalkar, S. Gangapurwala, and B. Stellato, “Data-Driven Risk-Sensitive Model Predictive Control for Multi-Robot Systems Using CVaR,” 2022. (arXiv:2209.07793).
[18] Z. Wang, O. So, K. Lee, and E. A. Theodorou, “Adaptive Risk-Sensitive Model Predictive Control with Stochastic Search,” in Proc. PMLR (L4DC), vol. 144, pp. 1–13, 2021.