Human-machine integration method for steering decision-making of intelligent vehicle based on reinforcement learning
-
摘要: 针对智能车人机共融驾驶系统中人和自主驾驶系统的驾驶权连续动态分配问题,尤其是因建模误差导致的权重分配方法适应性低的难题,提出了基于强化学习的人机共融转向驾驶决策方法;考虑驾驶人的转向特性,搭建了基于双点预瞄的驾驶人模型,并采用预测控制理论建立了智能车自主转向控制模型,构建了智能车人机同时在环的转向控制框架;基于Actor-Critic强化学习架构,设计了用于人机驾驶权分配的深度确定性策略梯度(DDPG)智能体,以曲率契合度、跟踪精确性和乘坐舒适性为目标,提出了基于模型的收益函数;构建了人机共融驾驶权分配强化学习框架,包含驾驶人模型、自主转向模型、驾驶权分配智能体以及收益函数;为了验证方法的有效性,招募了8位驾驶人开展共计48人次的模拟驾驶试验。研究结果表明:在曲率适应性验证中,人机共融-DDPG方法优于人工驾驶和人机共融-Fuzzy方法,跟踪性平均提升70.69%、39.67%,舒适性平均提升18.34%、7.55%;在速度适应性验证中,车速为40、60和80 km·h-1条件下,驾驶人权重大于0.5的时间占比分别为90.00%、85.76%、60.74%,且跟踪性相轨迹和舒适性相轨迹都能有效收敛。可见,提出的方法能够适应曲率和车速变化,在保证安全性的前提下提升了跟踪性和舒适性。Abstract: In terms of the continuous dynamic allocation problem of driving weights between human and autonomous driving systems in the human-machine integration (HMI) driving system of intelligent vehicles, especially the low adaptability problem of weight allocation methods caused by modeling errors, a HMI steering decision-making method based on the reinforcement learning was proposed. In view of drivers' steering characteristics, a driver model based on the two-point preview was built, and an autonomous steering control model of intelligent vehicles was established by adopting the predictive control theory. On this basis, a steering control framework of simultaneous human-machine in-loop for intelligent vehicles was constructed. According to the Actor-Critic reinforcement learning framework, a deep deterministic policy gradient (DDPG) agent for the human-machine driving weight allocation was designed, and a model-based gain function was proposed with the curvature adaptability, tracking accuracy, and ride comfort as targets. A reinforcement learning framework for the HMI driving weight allocation was constructed, which contains a driver model, an autonomous steering model, a driving weight allocation agent, and a gain function. To verify the effectiveness of the proposed method, eight drivers were recruited, and a total of 48 simulated driving experiments were carried out. Research results show that in the verification of curvature adaptability, the HMI-DDPG method is superior to the manned driving and HMI-Fuzzy methods. The trackability improves by an average of 70.69% and 39.67%, respectively, and the comfortability increases by an average of 18.34% and 7.55%, respectively. In the verification of speed adaptability, under the conditions of a vehicle speed of 40, 60, and 80 km·h-1, the time proportion is 90.00%, 85.76%, and 60.74%, respectively, when the driver's weight is greater than 0.5. The phase trajectories of both the trackability and the comfort can effectively converge. Therefore, the proposed method can adapt to changes in curvature and vehicle speed and improve the trackability and comfort on the premise of ensuring safety.
-
表 1 收益函数参数
Table 1. Gain function parameters
参数 取值 取值依据 τ1 1/3 转向角(°)均值的倒数 τ2 1 侧向加速度(m·s-2)均值的倒数 τ3 10 质心侧偏角(°)均值的倒数 τ4 5 位置误差(m)均值的倒数 τ5 2 航向角误差(°)均值的倒数 σ1、σ2、σ3 -1、-1、-1 平均权重 ρ1、ρ2、ρ3 1、1、10 表 2 DDPG算法参数
Table 2. DDPG algorithm parameters
参数 取值 采样步长/s 0.1 单次训练时间/s 60 Critic学习率 5.0×10-4 Actor学习率 1.0×10-3 平滑因子 1.0×10-3 经验采样数 64 表 3 工况1中本文方法的优势
Table 3. Advantages of proposed method in working condition 1
% 参数 对比方法 驾驶人 均值 1 2 3 4 5 6 7 8 e1max 人工驾驶 67.89 80.95 77.73 85.34 81.21 74.01 73.56 77.93 77.33 人机共融-Fuzzy 36.27 32.02 34.63 38.39 39.11 47.77 8.73 22.01 32.37 e2max 人工驾驶 60.40 77.40 70.80 75.06 63.11 27.61 63.18 74.86 64.05 人机共融-Fuzzy 29.57 62.77 56.40 61.60 51.49 41.37 51.12 21.41 46.97 amax 人工驾驶 16.33 31.52 29.69 7.44 19.46 22.29 15.71 27.59 21.25 人机共融-Fuzzy 8.18 15.72 5.02 16.39 15.83 14.53 -0.55 5.11 10.03 βmax 人工驾驶 12.47 19.22 21.75 8.91 16.00 14.36 13.21 17.49 15.43 人机共融-Fuzzy 5.31 3.91 1.71 8.95 8.68 7.07 1.97 2.99 5.07 表 4 工况1的驾驶人1指标对比
Table 4. Indicator comparison of driver 1 in working condition 1
对比方法 e1max /m e2max/(°) βmax/(°) amax/(m·s-2) Δδ0/[(°)·s-1] Δa0/(m·s-3) 人工驾驶 0.875 2.88 0.339 3.53 5.42 0.482 人机共融-Fuzzy 0.441 1.62 0.313 3.22 4.76 0.356 人机共融-DDPG 0.281 1.14 0.296 2.95 3.31 0.272 表 5 工况2中驾驶人权重大于0.5的时间占比
Table 5. Time ratios in condition 2 when driver's weight is greater than 0.5
车速/ (km·h-1) 不同驾驶人的时间占比/% 均值/% 1 2 3 4 5 6 7 8 40 91.45 89.76 87.00 87.76 91.67 92.49 89.98 89.89 90.00 60 90.34 66.54 83.41 89.97 91.74 85.37 83.94 94.80 85.76 80 71.92 53.98 55.09 67.53 65.53 51.00 47.62 73.21 60.74 -
[1] HARBM, STATHOPOULOS A, SHIFTAN Y, et al. What do we (not) know about our future with automated vehicles?[J]. Transportation Research Part C: Emerging Technologies, 2021, 123: 102948. doi: 10.1016/j.trc.2020.102948 [2] 姚荣涵, 祁文彦, 郭伟伟. 自动驾驶环境下驾驶人接管行为结构方程模型[J]. 交通运输工程学报, 2021, 21(2): 209-221. doi: 10.19818/j.cnki.1671-1637.2021.02.018YAO Rong-han, QI Wen-yan, GUO Wei-wei. Structural equation model of drivers' takeover behaviors in autonomous driving environment[J]. Journal of Traffic and Transportation Engineering, 2021, 21(2): 209-221. (in Chinese) doi: 10.19818/j.cnki.1671-1637.2021.02.018 [3] 胡云峰, 曲婷, 刘俊, 等. 智能汽车人机协同控制的研究现状与展望[J]. 自动化学报, 2019, 45(7): 1261-1280. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201907004.htmHU Yun-feng, QU Ting, LIU Jun, et al. Human-machine cooperative control of intelligent vehicle: recent developments and future perspectives[J]. Acta Automatica Sinica, 2019, 45(7): 1261-1280. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201907004.htm [4] WANG Wen-shuo, NA Xiao-xiang, CAO Dong-pu, et al. Decision-making in driver-automation shared control: a review and perspectives[J]. IEEE/CAA Journal of Automatica Sinica, 2020, 7(5): 1289-1307. [5] 宗长富, 代昌华, 张东. 智能汽车的人机共驾技术研究现状和发展趋势[J]. 中国公路学报, 2021, 34(6): 214-237. doi: 10.3969/j.issn.1001-7372.2021.06.021ZONG Chang-fu, DAI Chang-hua, ZHANG Dong. Human-machine interaction technology of intelligent vehicles: current development trends and future directions[J]. China Journal of Highway and Transport, 2021, 34(6): 214-237. (in Chinese) doi: 10.3969/j.issn.1001-7372.2021.06.021 [6] ERLIEN S M, FUJITA S, GERDES J C. Shared steering control using safe envelopes for obstacle avoidance and vehicle stability[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(2): 441-451. doi: 10.1109/TITS.2015.2453404 [7] SONG L, GUO H, WANG F, et al. Model predictive control oriented shared steering control for intelligent vehicles[C]//IEEE. 29th Chinese Control and Decision Conference (CCDC). New York: IEEE, 2017: 7568-7573. [8] LYU Chen, CAO Dong-pu, ZHAO Yi-fan, et al. Analysis of autopilot disengagements occurring during autonomous vehicle testing[J]. IEEE/CAA Journal of Automatica Sinica, 2018, 5(1): 58-68. doi: 10.1109/JAS.2017.7510745 [9] 吴超仲, 吴浩然, 吕能超. 人机共驾智能汽车的控制权切换与安全性综述[J]. 交通运输工程学报, 2018, 18(6): 131-141. doi: 10.3969/j.issn.1671-1637.2018.06.014WU Chao-zhong, WU Hao-ran, LYU Neng-chao. Review of control switch and safety of human-computer driving intelligent vehicle[J]. Journal of Traffic and Transportation Engineering, 2018, 18(6): 131-141. (in Chinese) doi: 10.3969/j.issn.1671-1637.2018.06.014 [10] 郭烈, 马跃, 岳明, 等. 驾驶特性的识别评估及其在智能汽车上的应用综述[J]. 交通运输工程学报, 2021, 21(2): 7-20. doi: 10.19818/j.cnki.1671-1637.2021.02.002GUO Lie, MA Yue, YUE Ming, et al. Overview of recognition and evaluation of driving characteristics and their applications in intelligent vehicles[J]. Journal of Traffic and Transportation Engineering, 2021, 21(2): 7-20. (in Chinese) doi: 10.19818/j.cnki.1671-1637.2021.02.002 [11] JIN M, LU G, CHEN F, et al. Modeling takeover behavior in level 3 automated driving via a structural equation model: considering the mediating role of trust[J]. Accident Analysis and Prevention, 2021, 157: 106156. doi: 10.1016/j.aap.2021.106156 [12] 何仁, 赵晓聪, 杨奕彬, 等. 基于驾驶人风险响应机制的人机共驾模型[J]. 吉林大学学报(工学版), 2021, 51(3): 799-809. https://www.cnki.com.cn/Article/CJFDTOTAL-JLGY202103003.htmHE Ren, ZHAO Xiao-cong, YANG Yi-bin, et al. Man-machine shared driving model using risk-response mechanism of human driver[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(3): 799-809. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JLGY202103003.htm [13] MARCANO M, DÍAZ S, PÉREZ J, et al. A review of shared control for automated vehicles: theory and applications[J]. IEEE Transactions on Human-Machine Systems, 2020, 50(6): 475-491. doi: 10.1109/THMS.2020.3017748 [14] NGUYEN A T, SENTOUH C, POPIEUL J C. Driver-automation cooperative approach for shared steering control under multiple system constraints: design and experiments[J]. IEEE Transactions on Industrial Electronics, 2017, 64(5): 3819-3830. doi: 10.1109/TIE.2016.2645146 [15] SENTOUH C, NGUYEN A T, BENLOUCIF M A, et al. Driver-automation cooperation oriented approach for shared control of lane keeping assist systems[J]. IEEE Transactions on Control Systems Technology, 2019, 27(5): 1962-1978. doi: 10.1109/TCST.2018.2842211 [16] WANG Wen-shuo, XI Jun-qiang, LIU Chang, et al. Human-centered feed-forward control of a vehicle steering system based on a driver's path-following characteristics[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(6): 1440-1453. [17] 郭烈, 葛平淑, 夏文旭, 等. 基于人机共驾的车道保持辅助控制系统研究[J]. 中国公路学报, 2019, 32(12): 46-57. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201912006.htmGUO Lie, GE Ping-shu, XIA Wen-xu, et al. Lane-keeping control systems based on human-machine cooperative driving[J]. China Journal of Highway and Transport, 2019, 32(12): 46-57. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201912006.htm [18] LUO Rui-kun, WENG Yi-fan, WANG Yi-fan, et al. A workload adaptive haptic shared control scheme for semi-autonomous driving[J]. Accident Analysis and Prevention, 2021, 152: 105968. doi: 10.1016/j.aap.2020.105968 [19] BENCLOUCIF A, NGUYEN A T, SENTOUH C, et al. Cooperative trajectory planning for haptic shared control between driver and automation in highway driving[J]. IEEE Transactions on Industrial Electronics, 2019, 66(12): 9846-9857. doi: 10.1109/TIE.2019.2893864 [20] GUO C, SENTOUH C, POPIEUL J C, et al. Shared control framework applied for vehicle longitudinal control in highway merging scenarios[C]//IEEE. 2015 IEEE International Conference on Systems, Man, and Cybernetics. New York: IEEE, 2015: 3098-3103. [21] GHASEMI A H, JAYAKUMAR P, GILLESPIE R B. Shared control architectures for vehicle steering[J]. Cognition Technology and Work, 2019, 21(4): 699-709. doi: 10.1007/s10111-019-00560-9 [22] ZWAAN H M, PETERMEIJER S M, ABBINK D A. Haptic shared steering control with an adaptive level of authority based on time-to-line crossing[J]. IFAC PapersOnLine, 2019, 52(19): 49-54. doi: 10.1016/j.ifacol.2019.12.085 [23] 陈无畏, 王其东, 丁雨康, 等. 基于预期偏移距离的人机权值分配策略研究[J]. 汽车工程, 2020, 42(4): 513-521. https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202004015.htmCHEN Wu-wei, WANG Qi-dong, DING Yu-kang, et al. Weight allocation strategy between human and machine based on the preview distance to lane center[J]. Automotive Engineering, 2020, 42(4): 513-521. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202004015.htm [24] LIANG Huang-huang, YANG Lu, CHENG Hong, et al. Human-in-the-loop reinforcement learning[C]//IEEE. 2017 Chinese Automation Congress (CAC). New York: IEEE, 2017: 4511-4518. [25] LI Jun-xiang, YAO Liang, XU Xin, et al. Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving[J]. Information Sciences, 2020, 532: 110-124. [26] 郭柏苍, 王胤霖, 谢宪毅, 等. 基于人-车风险状态的人机共驾控制权决策方法[J]. 中国公路学报, 2022, 35(3): 153-165. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL202203013.htmGUO Bo-cang, WANG Yin-lin, XIE Xian-yi, et al. Decision making method for control right transition of human-machine shared driving based on driver-vehicle risk state[J]. China Journal of Highway and Transport, 2022, 35(3): 153-165. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL202203013.htm [27] 田彦涛, 赵彦博, 谢波. 基于驾驶员转向模型的共享控制系统[J]. 自动化学报, 2022, 48(7): 1664-1677. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202207003.htmTIAN Yan-tao, ZHAO Yan-bo, XIE Bo. Shared control system based on driver steering model[J]. Acta Automatica Sinica, 2022, 48(7): 1664-1677. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202207003.htm [28] SALEH L, CHEVREL P, MARS F, et al. Human-like cybernetic driver model for lane keeping[C]//IFAC. Proceedings of the 18th World Congress. Milano: IFAC, 2011: 4368-4373. [29] 冷姚, 赵树恩. 智能车辆横向轨迹跟踪的显式模型预测控制方法[J]. 系统仿真学报, 2021, 33(5): 1177-1187. https://www.cnki.com.cn/Article/CJFDTOTAL-XTFZ202105020.htmLENG Yao, ZHAO Shu-en. Explicit model predictive control for intelligent vehicle lateral trajectory tracking[J]. Journal of System Simulation, 2021, 33(5): 1177-1187. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-XTFZ202105020.htm [30] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Open Review. net. International Conference on Learning Representations 2016. San Juan, Puerto Rico: OpenReview. net, 2016: 1-14. [31] BRAKEL D B P, GOYAL K X A, PINEAU RL J, et al. An actor-critic algorithm for sequence prediction[C]//Open Review. net. International Conference on Learning Representations 2017. Palais des Congrès Neptune: OpenReview. net, 2017: 1-17.