Reinforcement Learning-Based Model Predictive Controller  for Mobile Robots in Dynamic Environments with Safety  Constraints

Olivia Sulistina

doi:10.54209/elimensi.v3i02.408

Download

Published: May 30, 2025

DOI: https://doi.org/10.54209/elimensi.v3i02.408

Keywords:

Model Predictive Controller (MPC), Safe Reinforcement Learning (Safe RL), Control Barrier Function (CBF), Chance Constrained/CVaR MPC, Mobile Robot Navigation, Perceptual Uncertainty & Estimation, Sim-to Real & Domain Randomization, Real-Time Optimization.

Issue

Vol. 3 No. 02 (2025): Elimensi : Journal of Electrical Engineering

Section

Articles

Statistics Article

Article View : 129 Times

Olivia Sulistina

Universitas Pembangunan Panca Budi

Abstract

This paper proposes a Reinforcement Learning-based Model Predictive Controller (RL-MPC) for mobile robots operating in dynamic environments with stringent safety constraints. The key challenges addressed include model and perception uncertainty, moving obstacles, and real-time computational requirements. The proposed framework combines three key components: first, a learned dynamics model with uncertainty estimation to enhance the robustness of the system in uncertain environments; second, a risk-aware MPC that uses chance constraints and Conditional Value-at-Risk (CVaR) to enforce safety by ensuring that violation probabilities remain below a predefined threshold; and third, a Control Barrier Function (CBF) that acts as a safety layer, projecting actions to stay within a predefined safe set. The policy learning, utilizing Proximal Policy Optimization (PPO) or Soft Actor-Critic (SAC), is integrated with reward shaping and safety shielding to ensure that the robot prioritizes safety while achieving performance. Additionally, a sim-to-real strategy with domain randomization is employed to improve robustness when transitioning from simulation to real-world applications. The framework is evaluated in three different scenarios: solid static obstacles, moving obstacles, and multi-agent traffic. The results demonstrate that RL-MPC reduces the safety violation rate to ≤2%, a significant improvement compared to 2.8–12.3% in the baseline. Moreover, RL-MPC increases the minimum distance between the robot and obstacles to approximately 0.2 meters in dynamic scenarios and achieves a success rate of 95–99%, without significantly increasing the path length or energy consumption. Although the computational overhead increases by approximately 3–5 ms compared to classical MPC, the system still meets the 20 ms per cycle requirement, making it suitable for real-time applications. The ablation study confirms the dominant role of the CBF and risk-based constraints in preventing near-collisions, highlighting their crucial contribution to the system's safety. Overall, the RL-MPC framework provides a favorable trade-off between safety, efficiency, and implementation feasibility, offering a promising solution for online autonomous operations in dynamic environments.

How to Cite

Sulistina, O. (2025). Reinforcement Learning-Based Model Predictive Controller for Mobile Robots in Dynamic Environments with Safety Constraints . Journal of Electrical Engineering, 3(02), 68–75. https://doi.org/10.54209/elimensi.v3i02.408

References

[1] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control Barrier Function Based Quadratic Programs for Safety Critical Systems,” IEEE Trans. Autom. Control, vol. 62, no. 8, pp. 3861–3876, 2017, doi: 10.1109/TAC.2016.2638961.
[2] A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control Barrier Function Based Quadratic Programs with Application to Adaptive Cruise Control,” in Proc. IEEE CDC, 2014, pp. 6271–6278, doi: 10.1109/CDC.2014.7040372.
[3] D. Q. Mayne, “Tube-based robust nonlinear model predictive control,” Int. J. Robust Nonlinear Control, vol. 21, no. 11, pp. 1341–1353, 2011, doi: 10.1002/rnc.1758.
[4] A. T. Schwarm and M. Morari, “Chance-Constrained Model Predictive Control,” AIChE J., vol. 45, no. 8, pp. 1743–1752, 1999, doi: 10.1002/aic.690450811.
[5] M. Cannon, B. Kouvaritakis, and D. Ng, “Probabilistic tubes in linear stochastic model predictive control,” Syst. Control Lett., vol. 58, no. 10–11, pp. 747–753, 2009, doi: 10.1016/j.sysconle.2009.08.004.
[6] T. Koller, F. Berkenkamp, M. Turchetta, J. Boedecker, and A. Krause, “Learning-based Model Predictive Control for Safe Exploration,” in Proc. IEEE CDC, 2018, pp. 6059–6066, doi: 10.1109/CDC.2018.8619572. (arXiv:1906.12189).
[7] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe Model-Based Reinforcement Learning with Stability Guarantees,” in Proc. NeurIPS, 2017. (arXiv:1705.08551).
[8] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: an operator splitting solver for quadratic programs,” Math. Program. Comput., vol. 12, no. 4, pp. 637–672, 2020, doi: 10.1007/s12532-020-00179-2.
[9] H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl, “qpOASES: A Parametric Active-Set Algorithm for Quadratic Programming,” Math. Program. Comput., vol. 6, no. 4, pp. 327–363, 2014, doi: 10.1007/s12532-014-0071-1.
[10] J. Tobin et al., “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in Proc. IEEE/RSJ IROS, 2017, pp. 23–30, doi: 10.1109/IROS.2017.8202133.
[11] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL with a Stochastic Actor,” in Proc. PMLR (ICML), 2018, pp. 1861–1870. (No DOI; PMLR v80).
[12] T. Haarnoja et al., “Soft Actor-Critic Algorithms and Applications,” 2018. (arXiv:1812.05905).
[13] W. Xiao, C. V. Paredes, N. Ozay, and A. D. Ames, “Control Barrier Functions for Systems with High Relative Degree,” in Proc. IEEE CDC, 2019, pp. 474–481, doi: 10.1109/CDC40024.2019.9029455.
[14] G. Williams et al., “Information Theoretic MPC for Model-Based Reinforcement Learning,” in Proc. IEEE ICRA, 2017, pp. 1714–1721, doi: 10.1109/ICRA.2017.7989202.
[15] L. Hewing, J. Kabzan, and M. N. Zeilinger, “Cautious Model Predictive Control Using Gaussian Process Regression,” 2017. (arXiv:1705.10702).
[16] A. D. Bonzanini, J. H. P. Pádua, L. Fagiano, and M. Farina, “Perception-aware chance-constrained model predictive control for constrained robots,” Automatica (in press, 2024). (ScienceDirect record).
[17] A. Navsalkar, S. Gangapurwala, and B. Stellato, “Data-Driven Risk-Sensitive Model Predictive Control for Multi-Robot Systems Using CVaR,” 2022. (arXiv:2209.07793).
[18] Z. Wang, O. So, K. Lee, and E. A. Theodorou, “Adaptive Risk-Sensitive Model Predictive Control with Stochastic Search,” in Proc. PMLR (L4DC), vol. 144, pp. 1–13, 2021.

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Abstract

##plugins.themes.bootstrap3.article.details##