Trajectory tracking control of intelligent vehicles based on deep reinforcement learning and rolling horizon optimization
-
摘要: 为提升深度强化学习训练的智能车辆轨迹跟踪策略的泛化性,针对在单一速度工况训练的强化学习模型在其他速度工况下轨迹跟踪效果不理想的问题,提出了一种融合滚动优化与双延迟深度确定性策略梯度(ROTD3)的智能车辆轨迹跟踪控制方法;以固定车速跟踪双移线轨迹,进行深度强化学习双延迟深度确定性策略梯度(TD3)轨迹跟踪模型训练,通过调试参数获得满足轨迹跟踪精度并快速收敛的策略;基于所训练的TD3模型,结合模型预测控制(MPC)的思想构建融合ROTD3框架,在预测时域中以TD3模型输出的前轮转角预测车辆状态,将轨迹跟踪过程中的横向偏差与航向角偏差经过滚动时域优化,并通过二次规划方法求解车辆前轮转角增量形式的控制时域序列,以控制时域中的首个控制增量与TD3控制量相加作为车辆前轮转角控制量,利用ROTD3与TD3、MPC进行轨迹跟踪仿真试验并对比分析试验结果。研究结果表明:ROTD3具有更高的轨迹跟踪精度,在纵向车速为20 m·s-1双移线轨迹跟踪过程中,横向偏差平均绝对误差相对于TD3减少了83.52%,相对于MPC减少了91.02%;在跟踪蛇形轨迹时,ROTD3方法的仿真结果与双移线仿真结果基本一致,当TD3模型输出的前轮转角跟踪偏差较大时,ROTD3通过滚动时域优化得到的前轮转角增量可以对其进行有效补偿。可见,ROTD3框架可显著提高不同工况下的车辆轨迹跟踪效果,有效提升了TD3强化学习轨迹跟踪策略的泛化性与适用性。Abstract: To improve the generalization of trajectory tracking strategies for intelligent vehicles trained by deep reinforcement learning, a method of trajectory tracking control of intelligent vehicles based on rolling optimization and twin delayed deep deterministic policy gradient (ROTD3) was proposed to address the issue of poor trajectory tracking performance under different speed conditions when the reinforcement learning models were trained at a single speed. The trajectory tracking model was trained by a twin delayed deep deterministic policy gradient (TD3) deep reinforcement learning with a fixed speed tracking double lane change trajectory. Parameters of TD3 model were adjusted to obtain a strategy that satisfied the required trajectory tracking accuracy and achieved rapid convergence. Based on the trained TD3 model and the idea of model predictive control (MPC), a framework integrating ROTD3 was constructed. In the prediction horizon, the front-wheel steering angle output by the TD3 model was used for prediction. The lateral deviation and heading deviation in the course of trajectory tracking were optimized in the rolling horizon, and the control horizon sequence in the form of front-wheel steering angle increment of vehicles was solved by the quadratic programming method. The first control increment in the control horizon was added to the TD3 control quantity. This sum was used as the front-wheel steering angle control ourput. The ROTD3, TD3, and MPC were compared through the simulation experiments for trajectory tracking. Research results show that the ROTD3 achieves higher trajectory tracking accuracy. During the double lane change trajectory tracking at a longitudinal speed of 20 m·s-1, the mean absolute lateral deviation of ROTD3 reduces by 83.52% compared with TD3, and reduces 91.02% compared with MPC. When tracking a snake-like trajectory, the results of ROTD3 are consistent with those of double lane change simulation. When the front-wheel steering angle output by the TD3 model results in large tracking deviations, the front-wheel steering angle increment obtained through the rolling horizon optimization effectively compensates for these deviations. The ROTD3 framework significantly improves the vehicle trajectory tracking performance under various conditions and effectively enhances the generalization and applicability of TD3 reinforcement learning trajectory tracking strategy.
-
Key words:
- automotive engineering /
- intelligent vehicle /
- trajectory tracking /
- TD3 /
- rolling horizon optimization
-
表 1 TD3训练参数
Table 1. Training parameters of TD3
采样时间/s 0.01 总仿真时间/s 11 Critic 学习率 1.0×10-4 优化器 Adam 梯度阈值 1 L2正则化系数 1.0×10-4 Actor 学习率 8.0×10-4 优化器 Adam 梯度阈值 1 L2正则化系数 0.001 未来奖励折扣因子 0.99 目标网络更新平滑因子 0.005 目标网络更新系数 10 经验缓冲区大小 1.0×106 随机小批量大小 256 总训练集 100 探索噪声 类型 Ornstein-Uhlenbeck 初始值 0.02 方差衰减 1.0×10-4 最小值 1.0×10-4 表 2 智能车辆参数
Table 2. Parameters of intelligent vehicle
参数 数值 m/kg 1 704.7 a/m 1.035 b/m 1.655 Cf/(N·rad-1) -31 106.98 Cr/(N·rad-1) -29 584.56 Iz/(kg·m2) 3 048 δf/rad [-0.5, 0.5] 表 3 MPC参数
Table 3. Parameters of MPC
参数 数值 控制时域 20 预测时域 8 离散时间/s 0.05 控制增量权重 1 横向偏差权重 10 航向角偏差权重 8 -
[1] 上官伟, 李鑫, 柴琳果, 等. 车路协同环境下混合交通群体智能仿真与测试研究综述[J]. 交通运输工程学报, 2022, 22(3): 19-40. doi: 10.19818/j.cnki.1671-1637.2022.03.002SHANGGUAN Wei, LI Xin, CHAI Lin-guo, et al. Research review on simulation and test of mixed traffic swarm in vehicle-infrastructure cooperative environment[J]. Journal of Traffic and Transportation Engineering, 2022, 22(3): 19-40. (in Chinese) doi: 10.19818/j.cnki.1671-1637.2022.03.002 [2] WANG Hong, HUANG Yan-jun, KHAJEPOUR A, et al. Ethical decision-making platform in autonomous vehicles with lexicographic optimization based model predictive controller[J]. IEEE Transactions on Vehicular Technology, 2020, 69(8): 8164-8175. doi: 10.1109/TVT.2020.2996954 [3] 李克强, 陈涛, 罗禹贡, 等. 智能环境友好型车辆——概念、体系结构及工程实现[J]. 汽车工程, 2010, 32(9): 743-748, 762.LI Ke-qiang, CHEN Tao, LUO Yu-gong, et al. Environmentally friendly intelligent vehicle: concept architecture and implementation[J]. Automotive Engineering, 2010, 32(9): 743-748, 762. (in Chinese) [4] 蔡英凤, 秦顺琪, 臧勇, 等. 基于可拓优度评价的智能汽车横向轨迹跟踪控制方法[J]. 汽车工程, 2019, 41(10): 1189-1196.CAI Ying-feng, QIN Shun-qi, ZANG Yong, et al. Lateral trajectory tracking control scheme for intelligent vehicle based on extension goodness evaluation[J]. Automotive Engineering, 2019, 41(10): 1189-1196. (in Chinese) [5] 张新荣, 康龙, 唐家朋, 等. 基于变论域模糊多参数自整定PID控制的智能挖掘机轨迹跟踪[J]. 中国公路学报, 2023, 36(2): 240-250. doi: 10.3969/j.issn.1001-7372.2023.02.020ZHANG Xin-rong, KANG Long, TANG Jia-peng, et al. Trajectory tracking of intelligent excavator using variable universe fuzzy multi-parameter self-tuning PID control[J]. China Journal of Highway and Transport, 2023, 36(2): 240-250. (in Chinese) doi: 10.3969/j.issn.1001-7372.2023.02.020 [6] LONG Jia-teng, ZHU Sheng-ying, CUI Ping-yuan, et al. Barrier Lyapunov function based sliding mode control for Mars atmospheric entry trajectory tracking with input saturation constraint[J]. Aerospace Science and Technology, 2020, 106: 106213. doi: 10.1016/j.ast.2020.106213 [7] 赵树恩, 冷姚, 邵毅明. 车辆多目标自适应巡航显式模型预测控制[J]. 交通运输工程学报, 2020, 20(3): 206-216. doi: 10.19818/j.cnki.1671-1637.2020.03.019ZHAO Shu-en, LENG Yao, SHAO Yi-ming. Explicit model predictive control of multi-objective adaptive cruise of vehicle[J]. Journal of Traffic and Transportation Engineering, 2020, 20(3): 206-216. (in Chinese) doi: 10.19818/j.cnki.1671-1637.2020.03.019 [8] CHEN Xin-bo, BAO Qi-lin, ZHANG Bang. Research on 4WIS electric vehicle path tracking control based on adaptive fuzzy PID algorithm[C]//IEEE. 2019 Chinese Control Conference (CCC). New York: IEEE, 2019: 6753-6760. [9] ONIEVA E, NARANJO J E, MILANÉS V, et al. Automatic lateral control for unmanned vehicles via genetic algorithms[J]. Applied Soft Computing, 2011, 11(1): 1303-1309. doi: 10.1016/j.asoc.2010.04.003 [10] GOSWAMI N K, PADHY P K. Sliding mode controller design for trajectory tracking of a non-holonomic mobile robot with disturbance[J]. Computers and Electrical Engineering, 2018, 72: 307-323. doi: 10.1016/j.compeleceng.2018.09.021 [11] ZHANG Kun-wu, SUN Qi, SHI Yang. Trajectory tracking control of autonomous ground vehicles using adaptive learning MPC[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(12): 5554-5564. doi: 10.1109/TNNLS.2020.3048305 [12] 潘世举, 李华, 苏致远, 等. 基于跟踪误差模型的智能车辆轨迹跟踪方法[J]. 汽车工程, 2019, 41(9): 1021-1027.PAN Shi-ju, LI Hua, SU Zhi-yuan, et al. Trajectory tracking method for intelligent vehicles based on tracking-error model[J]. Automotive Engineering, 2019, 41(9): 1021-1027. (in Chinese) [13] LI Jun-xiang, YAO Liang, XU Xin, et al. Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving[J]. Information Sciences, 2020, 532: 110-124. doi: 10.1016/j.ins.2020.03.105 [14] DA SILVA F L, COSTA A H R. A survey on transfer learning for multiagent reinforcement learning systems[J]. Journal of Artificial Intelligence Research, 2019, 64: 645-703. doi: 10.1613/jair.1.11396 [15] LI Dong, ZHAO Dong-bin, ZHANG Qi-chao, et al. Reinforcement learning and deep learning based lateral control for autonomous driving[J]. IEEE Computational Intelligence Magazine, 2019, 14(2): 83-98. doi: 10.1109/MCI.2019.2901089 [16] FUCHS F, SONG Y L, KAUFMANN E, et al. Super-human performance in gran turismo sport using deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 6(3): 4257-4264. doi: 10.1109/LRA.2021.3064284 [17] 林歆悠, 叶卓明, 周斌豪. 基于DQN强化学习的自动驾驶转向控制策略[J]. 机械工程学报, 2023, 59(16): 315-324.LIN Xin-you, YE Zhuo-ming, ZHOU Bin-hao. DQN reinforcement learning-based steering control strategy for autonomous driving[J]. Journal of Mechanical Engineering, 2023, 59(16): 315-324. (in Chinese) [18] 郑川, 杜煜, 刘子健. 基于模糊收敛和模仿强化学习的自动驾驶横向控制方法[J]. 汽车技术, 2024(7): 29-36.ZHENG Chuan, DU Yu, LIU Zi-jian. A lateral control method of autonomous driving based on fuzzy convergence and imitative reinforcement learning[J]. Automotive Technology, 2024(7): 29-36. (in Chinese) [19] ZHOU Quan, ZHAO De-zong, SHUAI Bin, et al. Knowledge implementation and transfer with an adaptive learning network for real-time power management of the plug-in hybrid vehicle[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(12): 5298-5308. doi: 10.1109/TNNLS.2021.3093429 [20] 焦龙飞, 谷志茹, 舒小华, 等. 自动驾驶路径优化的RF-DDPG车辆控制算法研究[J]. 湖南工业大学学报, 2024, 38(1): 62-69. doi: 10.3969/j.issn.1673-9833.2024.01.009JIAO Long-fei, GU Zhi-ru, SHU Xiao-hua, et al. Research on RF-DDPG vehicle control algorithm for autonomous driving path optimization[J]. Journal of Hunan University of Technology, 2024, 38(1): 62-69. (in Chinese) doi: 10.3969/j.issn.1673-9833.2024.01.009 [21] 李新凯, 虎晓诚, 马萍, 等. 基于改进DDPG的无人驾驶避障跟踪控制[J]. 华南理工大学学报(自然科学版), 2023, 51(11): 44-55. doi: 10.12141/j.issn.1000-565X.220747LI Xin-kai, HU Xiao-cheng, MA Ping, et al. Driverless obstacle avoidance and tracking control based on improved DDPG[J]. Journal of South China University of Technology (Natural Science Edition), 2023, 51(11): 44-55. (in Chinese) doi: 10.12141/j.issn.1000-565X.220747 [22] 赖金萍, 李浩, 石英, 等. 基于DDPG算法的无人车辆防碰撞控制策略[J]. 武汉理工大学学报, 2021, 43(10): 68-76.LAI Jin-ping, LI Hao, SHI Ying, et al. Anti collision control strategy of unmanned vehicle based on DDPG algorithm[J]. Journal of Wuhan University of Technology, 2021, 43(10): 68-76. (in Chinese) [23] 李文礼, 邱凡珂, 廖达明, 等. 基于深度强化学习的高速公路换道跟踪控制模型[J]. 汽车安全与节能学报, 2022, 13(4): 750-759. doi: 10.3969/j.issn.1674-8484.2022.04.016LI Wen-li, QIU Fan-ke, LIAO Da-ming, et al. Highway lane change decision control model based on deep reinforcement learning[J]. Journal of Automotive Safety and Energy, 2022, 13(4): 750-759. (in Chinese) doi: 10.3969/j.issn.1674-8484.2022.04.016 [24] 贺伊琳, 宋若旸, 马建. 基于强化学习DDPG的智能车辆轨迹跟踪控制[J]. 中国公路学报, 2021, 34(11): 335-348. doi: 10.3969/j.issn.1001-7372.2021.11.026HE Yi-lin, SONG Ruo-yang, MA Jian. Trajectory tracking control of intelligent vehicle based on DDPG method of reinforcement learning[J]. China Journal of Highway and Transport, 2021, 34(11): 335-348. (in Chinese) doi: 10.3969/j.issn.1001-7372.2021.11.026 [25] SRIKONDA S, NORRIS W R, NOTTAGE D, et al. Deep reinforcement learning for autonomous dynamic skid steer vehicle trajectory tracking[J]. Robotics, 2022, 11(5): 11050095. [26] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 3215-3222. http://www.cs.ucl.ac.uk/staff/d.silver/web/Applications_files/rainbow.pdf [27] 张炳力, 佘亚飞. 基于深度强化学习的轨迹跟踪横向控制研究[J]. 合肥工业大学学报(自然科学版), 2023, 46(7): 865-872. doi: 10.3969/j.issn.1003-5060.2023.07.001ZHANG Bing-li, SHE Ya-fei. Research on lateral control of trajectory tracking based on deep reinforcement learning[J]. Journal of Hefei University of Technology (Natural Science), 2023, 46(7): 865-872. (in Chinese) doi: 10.3969/j.issn.1003-5060.2023.07.001 [28] 汪洪波, 王春阳, 赵林峰, 等. 基于强化学习的智能车辆路径跟踪变参数MPC多目标控制[J]. 中国公路学报, 2024, 37(3): 157-169.WANG Hong-bo, WANG Chun-yang, ZHAO Lin-feng, et al. Variable-parameter MPC multi-objective control for intelligent vehicle path tracking based on reinforcement learning[J]. China Journal of Highway and Transport, 2024, 37(3): 157-169. (in Chinese) [29] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//PMLR. Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 1587-1596. [30] JI Xue-wu, LIU Yu-long, NA Xiao-xiang, et al. Research on interactive steering control strategy between driver and AFS in different game equilibrium strategies and information patterns[J]. Vehicle System Dynamics, 2018, 56(9): 1344-1374. doi: 10.1080/00423114.2018.1435890 -