留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合深度强化学习与滚动时域优化的智能车辆轨迹跟踪控制

谢宪毅 赵鑫 金立生 郭柏苍 李克强

谢宪毅, 赵鑫, 金立生, 郭柏苍, 李克强. 融合深度强化学习与滚动时域优化的智能车辆轨迹跟踪控制[J]. 交通运输工程学报, 2024, 24(6): 259-272. doi: 10.19818/j.cnki.1671-1637.2024.06.018
引用本文: 谢宪毅, 赵鑫, 金立生, 郭柏苍, 李克强. 融合深度强化学习与滚动时域优化的智能车辆轨迹跟踪控制[J]. 交通运输工程学报, 2024, 24(6): 259-272. doi: 10.19818/j.cnki.1671-1637.2024.06.018
XIE Xian-yi, ZHAO Xin, JIN Li-sheng, GUO Bai-cang, LI Ke-qiang. Trajectory tracking control of intelligent vehicles based on deep reinforcement learning and rolling horizon optimization[J]. Journal of Traffic and Transportation Engineering, 2024, 24(6): 259-272. doi: 10.19818/j.cnki.1671-1637.2024.06.018
Citation: XIE Xian-yi, ZHAO Xin, JIN Li-sheng, GUO Bai-cang, LI Ke-qiang. Trajectory tracking control of intelligent vehicles based on deep reinforcement learning and rolling horizon optimization[J]. Journal of Traffic and Transportation Engineering, 2024, 24(6): 259-272. doi: 10.19818/j.cnki.1671-1637.2024.06.018

融合深度强化学习与滚动时域优化的智能车辆轨迹跟踪控制

doi: 10.19818/j.cnki.1671-1637.2024.06.018
基金项目: 

国家自然科学基金项目 52072333

国家自然科学基金项目 52202503

汽车安全与节能国家重点实验室开放基金项目 KFY2211

河北省自然科学基金项目 F2021203107

河北省自然科学基金项目 F2022203054

国家重点研发计划 2022YFF0604901

详细信息
    作者简介:

    谢宪毅(1989-),男,黑龙江齐齐哈尔人,燕山大学副教授,工学博士,从事智能汽车决策规划研究

    通讯作者:

    金立生(1975-),男,山东临朐人,燕山大学教授,工学博士

  • 中图分类号: U461

Trajectory tracking control of intelligent vehicles based on deep reinforcement learning and rolling horizon optimization

Funds: 

National Natural Science Foundation of China 52072333

National Natural Science Foundation of China 52202503

Open Fund Project of State Key Laboratory of Automotive Safety and Energy Conservation KFY2211

Natural Science Foundation of Hebei Province F2021203107

Natural Science Foundation of Hebei Province F2022203054

National Key Research and Development Program of China 2022YFF0604901

More Information
  • 摘要: 为提升深度强化学习训练的智能车辆轨迹跟踪策略的泛化性,针对在单一速度工况训练的强化学习模型在其他速度工况下轨迹跟踪效果不理想的问题,提出了一种融合滚动优化与双延迟深度确定性策略梯度(ROTD3)的智能车辆轨迹跟踪控制方法;以固定车速跟踪双移线轨迹,进行深度强化学习双延迟深度确定性策略梯度(TD3)轨迹跟踪模型训练,通过调试参数获得满足轨迹跟踪精度并快速收敛的策略;基于所训练的TD3模型,结合模型预测控制(MPC)的思想构建融合ROTD3框架,在预测时域中以TD3模型输出的前轮转角预测车辆状态,将轨迹跟踪过程中的横向偏差与航向角偏差经过滚动时域优化,并通过二次规划方法求解车辆前轮转角增量形式的控制时域序列,以控制时域中的首个控制增量与TD3控制量相加作为车辆前轮转角控制量,利用ROTD3与TD3、MPC进行轨迹跟踪仿真试验并对比分析试验结果。研究结果表明:ROTD3具有更高的轨迹跟踪精度,在纵向车速为20 m·s-1双移线轨迹跟踪过程中,横向偏差平均绝对误差相对于TD3减少了83.52%,相对于MPC减少了91.02%;在跟踪蛇形轨迹时,ROTD3方法的仿真结果与双移线仿真结果基本一致,当TD3模型输出的前轮转角跟踪偏差较大时,ROTD3通过滚动时域优化得到的前轮转角增量可以对其进行有效补偿。可见,ROTD3框架可显著提高不同工况下的车辆轨迹跟踪效果,有效提升了TD3强化学习轨迹跟踪策略的泛化性与适用性。

     

  • 图  1  车辆运动学模型

    Figure  1.  Vehicle kinematic model

    图  2  基于TD3的智能车辆轨迹跟踪控制过程

    Figure  2.  Trajectory tracking control process of intelligent vehicle based on TD3

    图  3  双移线轨迹

    Figure  3.  Double lane change trajectory

    图  4  双移线轨迹TD3训练过程

    Figure  4.  TD3 training process of double lane change trajectory

    图  5  不同车速下TD3双移线轨迹跟踪效果

    Figure  5.  Tracking effects of TD3 double lane change trajectory under different vehicle speeds

    图  6  基于ROTD3的跟踪控制过程

    Figure  6.  ROTD3-based tracking control process

    图  7  ROTD3控制原理

    Figure  7.  Control principle of ROTD3

    图  8  vx=10 m·s-1时双移线轨迹跟踪仿真结果

    Figure  8.  Simulation results of double lane change trajectory tracking with vx=10 m·s-1

    图  9  vx=20 m·s-1时双移线轨迹跟踪仿真结果

    Figure  9.  Simulation results of double lane change trajectory tracking with vx=20 m·s-1

    图  10  vx=10 m·s-1蛇形轨迹跟踪仿真结果

    Figure  10.  Simulation results of snake-like trajectory tracking with vx=10 m·s-1

    图  11  vx=20 m·s-1蛇形轨迹跟踪仿真结果

    Figure  11.  Simulation results of snake-like trajectory tracking with vx=20 m·s-1

    表  1  TD3训练参数

    Table  1.   Training parameters of TD3

    采样时间/s 0.01
    总仿真时间/s 11
    Critic 学习率 1.0×10-4
    优化器 Adam
    梯度阈值 1
    L2正则化系数 1.0×10-4
    Actor 学习率 8.0×10-4
    优化器 Adam
    梯度阈值 1
    L2正则化系数 0.001
    未来奖励折扣因子 0.99
    目标网络更新平滑因子 0.005
    目标网络更新系数 10
    经验缓冲区大小 1.0×106
    随机小批量大小 256
    总训练集 100
    探索噪声 类型 Ornstein-Uhlenbeck
    初始值 0.02
    方差衰减 1.0×10-4
    最小值 1.0×10-4
    下载: 导出CSV

    表  2  智能车辆参数

    Table  2.   Parameters of intelligent vehicle

    参数 数值
    m/kg 1 704.7
    a/m 1.035
    b/m 1.655
    Cf/(N·rad-1) -31 106.98
    Cr/(N·rad-1) -29 584.56
    Iz/(kg·m2) 3 048
    δf/rad [-0.5, 0.5]
    下载: 导出CSV

    表  3  MPC参数

    Table  3.   Parameters of MPC

    参数 数值
    控制时域 20
    预测时域 8
    离散时间/s 0.05
    控制增量权重 1
    横向偏差权重 10
    航向角偏差权重 8
    下载: 导出CSV
  • [1] 上官伟, 李鑫, 柴琳果, 等. 车路协同环境下混合交通群体智能仿真与测试研究综述[J]. 交通运输工程学报, 2022, 22(3): 19-40. doi: 10.19818/j.cnki.1671-1637.2022.03.002

    SHANGGUAN Wei, LI Xin, CHAI Lin-guo, et al. Research review on simulation and test of mixed traffic swarm in vehicle-infrastructure cooperative environment[J]. Journal of Traffic and Transportation Engineering, 2022, 22(3): 19-40. (in Chinese) doi: 10.19818/j.cnki.1671-1637.2022.03.002
    [2] WANG Hong, HUANG Yan-jun, KHAJEPOUR A, et al. Ethical decision-making platform in autonomous vehicles with lexicographic optimization based model predictive controller[J]. IEEE Transactions on Vehicular Technology, 2020, 69(8): 8164-8175. doi: 10.1109/TVT.2020.2996954
    [3] 李克强, 陈涛, 罗禹贡, 等. 智能环境友好型车辆——概念、体系结构及工程实现[J]. 汽车工程, 2010, 32(9): 743-748, 762.

    LI Ke-qiang, CHEN Tao, LUO Yu-gong, et al. Environmentally friendly intelligent vehicle: concept architecture and implementation[J]. Automotive Engineering, 2010, 32(9): 743-748, 762. (in Chinese)
    [4] 蔡英凤, 秦顺琪, 臧勇, 等. 基于可拓优度评价的智能汽车横向轨迹跟踪控制方法[J]. 汽车工程, 2019, 41(10): 1189-1196.

    CAI Ying-feng, QIN Shun-qi, ZANG Yong, et al. Lateral trajectory tracking control scheme for intelligent vehicle based on extension goodness evaluation[J]. Automotive Engineering, 2019, 41(10): 1189-1196. (in Chinese)
    [5] 张新荣, 康龙, 唐家朋, 等. 基于变论域模糊多参数自整定PID控制的智能挖掘机轨迹跟踪[J]. 中国公路学报, 2023, 36(2): 240-250. doi: 10.3969/j.issn.1001-7372.2023.02.020

    ZHANG Xin-rong, KANG Long, TANG Jia-peng, et al. Trajectory tracking of intelligent excavator using variable universe fuzzy multi-parameter self-tuning PID control[J]. China Journal of Highway and Transport, 2023, 36(2): 240-250. (in Chinese) doi: 10.3969/j.issn.1001-7372.2023.02.020
    [6] LONG Jia-teng, ZHU Sheng-ying, CUI Ping-yuan, et al. Barrier Lyapunov function based sliding mode control for Mars atmospheric entry trajectory tracking with input saturation constraint[J]. Aerospace Science and Technology, 2020, 106: 106213. doi: 10.1016/j.ast.2020.106213
    [7] 赵树恩, 冷姚, 邵毅明. 车辆多目标自适应巡航显式模型预测控制[J]. 交通运输工程学报, 2020, 20(3): 206-216. doi: 10.19818/j.cnki.1671-1637.2020.03.019

    ZHAO Shu-en, LENG Yao, SHAO Yi-ming. Explicit model predictive control of multi-objective adaptive cruise of vehicle[J]. Journal of Traffic and Transportation Engineering, 2020, 20(3): 206-216. (in Chinese) doi: 10.19818/j.cnki.1671-1637.2020.03.019
    [8] CHEN Xin-bo, BAO Qi-lin, ZHANG Bang. Research on 4WIS electric vehicle path tracking control based on adaptive fuzzy PID algorithm[C]//IEEE. 2019 Chinese Control Conference (CCC). New York: IEEE, 2019: 6753-6760.
    [9] ONIEVA E, NARANJO J E, MILANÉS V, et al. Automatic lateral control for unmanned vehicles via genetic algorithms[J]. Applied Soft Computing, 2011, 11(1): 1303-1309. doi: 10.1016/j.asoc.2010.04.003
    [10] GOSWAMI N K, PADHY P K. Sliding mode controller design for trajectory tracking of a non-holonomic mobile robot with disturbance[J]. Computers and Electrical Engineering, 2018, 72: 307-323. doi: 10.1016/j.compeleceng.2018.09.021
    [11] ZHANG Kun-wu, SUN Qi, SHI Yang. Trajectory tracking control of autonomous ground vehicles using adaptive learning MPC[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(12): 5554-5564. doi: 10.1109/TNNLS.2020.3048305
    [12] 潘世举, 李华, 苏致远, 等. 基于跟踪误差模型的智能车辆轨迹跟踪方法[J]. 汽车工程, 2019, 41(9): 1021-1027.

    PAN Shi-ju, LI Hua, SU Zhi-yuan, et al. Trajectory tracking method for intelligent vehicles based on tracking-error model[J]. Automotive Engineering, 2019, 41(9): 1021-1027. (in Chinese)
    [13] LI Jun-xiang, YAO Liang, XU Xin, et al. Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving[J]. Information Sciences, 2020, 532: 110-124. doi: 10.1016/j.ins.2020.03.105
    [14] DA SILVA F L, COSTA A H R. A survey on transfer learning for multiagent reinforcement learning systems[J]. Journal of Artificial Intelligence Research, 2019, 64: 645-703. doi: 10.1613/jair.1.11396
    [15] LI Dong, ZHAO Dong-bin, ZHANG Qi-chao, et al. Reinforcement learning and deep learning based lateral control for autonomous driving[J]. IEEE Computational Intelligence Magazine, 2019, 14(2): 83-98. doi: 10.1109/MCI.2019.2901089
    [16] FUCHS F, SONG Y L, KAUFMANN E, et al. Super-human performance in gran turismo sport using deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2021, 6(3): 4257-4264. doi: 10.1109/LRA.2021.3064284
    [17] 林歆悠, 叶卓明, 周斌豪. 基于DQN强化学习的自动驾驶转向控制策略[J]. 机械工程学报, 2023, 59(16): 315-324.

    LIN Xin-you, YE Zhuo-ming, ZHOU Bin-hao. DQN reinforcement learning-based steering control strategy for autonomous driving[J]. Journal of Mechanical Engineering, 2023, 59(16): 315-324. (in Chinese)
    [18] 郑川, 杜煜, 刘子健. 基于模糊收敛和模仿强化学习的自动驾驶横向控制方法[J]. 汽车技术, 2024(7): 29-36.

    ZHENG Chuan, DU Yu, LIU Zi-jian. A lateral control method of autonomous driving based on fuzzy convergence and imitative reinforcement learning[J]. Automotive Technology, 2024(7): 29-36. (in Chinese)
    [19] ZHOU Quan, ZHAO De-zong, SHUAI Bin, et al. Knowledge implementation and transfer with an adaptive learning network for real-time power management of the plug-in hybrid vehicle[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(12): 5298-5308. doi: 10.1109/TNNLS.2021.3093429
    [20] 焦龙飞, 谷志茹, 舒小华, 等. 自动驾驶路径优化的RF-DDPG车辆控制算法研究[J]. 湖南工业大学学报, 2024, 38(1): 62-69. doi: 10.3969/j.issn.1673-9833.2024.01.009

    JIAO Long-fei, GU Zhi-ru, SHU Xiao-hua, et al. Research on RF-DDPG vehicle control algorithm for autonomous driving path optimization[J]. Journal of Hunan University of Technology, 2024, 38(1): 62-69. (in Chinese) doi: 10.3969/j.issn.1673-9833.2024.01.009
    [21] 李新凯, 虎晓诚, 马萍, 等. 基于改进DDPG的无人驾驶避障跟踪控制[J]. 华南理工大学学报(自然科学版), 2023, 51(11): 44-55. doi: 10.12141/j.issn.1000-565X.220747

    LI Xin-kai, HU Xiao-cheng, MA Ping, et al. Driverless obstacle avoidance and tracking control based on improved DDPG[J]. Journal of South China University of Technology (Natural Science Edition), 2023, 51(11): 44-55. (in Chinese) doi: 10.12141/j.issn.1000-565X.220747
    [22] 赖金萍, 李浩, 石英, 等. 基于DDPG算法的无人车辆防碰撞控制策略[J]. 武汉理工大学学报, 2021, 43(10): 68-76.

    LAI Jin-ping, LI Hao, SHI Ying, et al. Anti collision control strategy of unmanned vehicle based on DDPG algorithm[J]. Journal of Wuhan University of Technology, 2021, 43(10): 68-76. (in Chinese)
    [23] 李文礼, 邱凡珂, 廖达明, 等. 基于深度强化学习的高速公路换道跟踪控制模型[J]. 汽车安全与节能学报, 2022, 13(4): 750-759. doi: 10.3969/j.issn.1674-8484.2022.04.016

    LI Wen-li, QIU Fan-ke, LIAO Da-ming, et al. Highway lane change decision control model based on deep reinforcement learning[J]. Journal of Automotive Safety and Energy, 2022, 13(4): 750-759. (in Chinese) doi: 10.3969/j.issn.1674-8484.2022.04.016
    [24] 贺伊琳, 宋若旸, 马建. 基于强化学习DDPG的智能车辆轨迹跟踪控制[J]. 中国公路学报, 2021, 34(11): 335-348. doi: 10.3969/j.issn.1001-7372.2021.11.026

    HE Yi-lin, SONG Ruo-yang, MA Jian. Trajectory tracking control of intelligent vehicle based on DDPG method of reinforcement learning[J]. China Journal of Highway and Transport, 2021, 34(11): 335-348. (in Chinese) doi: 10.3969/j.issn.1001-7372.2021.11.026
    [25] SRIKONDA S, NORRIS W R, NOTTAGE D, et al. Deep reinforcement learning for autonomous dynamic skid steer vehicle trajectory tracking[J]. Robotics, 2022, 11(5): 11050095.
    [26] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 3215-3222. http://www.cs.ucl.ac.uk/staff/d.silver/web/Applications_files/rainbow.pdf
    [27] 张炳力, 佘亚飞. 基于深度强化学习的轨迹跟踪横向控制研究[J]. 合肥工业大学学报(自然科学版), 2023, 46(7): 865-872. doi: 10.3969/j.issn.1003-5060.2023.07.001

    ZHANG Bing-li, SHE Ya-fei. Research on lateral control of trajectory tracking based on deep reinforcement learning[J]. Journal of Hefei University of Technology (Natural Science), 2023, 46(7): 865-872. (in Chinese) doi: 10.3969/j.issn.1003-5060.2023.07.001
    [28] 汪洪波, 王春阳, 赵林峰, 等. 基于强化学习的智能车辆路径跟踪变参数MPC多目标控制[J]. 中国公路学报, 2024, 37(3): 157-169.

    WANG Hong-bo, WANG Chun-yang, ZHAO Lin-feng, et al. Variable-parameter MPC multi-objective control for intelligent vehicle path tracking based on reinforcement learning[J]. China Journal of Highway and Transport, 2024, 37(3): 157-169. (in Chinese)
    [29] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//PMLR. Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 1587-1596.
    [30] JI Xue-wu, LIU Yu-long, NA Xiao-xiang, et al. Research on interactive steering control strategy between driver and AFS in different game equilibrium strategies and information patterns[J]. Vehicle System Dynamics, 2018, 56(9): 1344-1374. doi: 10.1080/00423114.2018.1435890
  • 加载中
图(11) / 表(3)
计量
  • 文章访问数:  72
  • HTML全文浏览量:  21
  • PDF下载量:  11
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-08-29
  • 刊出日期:  2024-12-25

目录

    /

    返回文章
    返回