Hierarchical assessment method for vehicle accident risk classification based on temporal misalignment causal variables and sub-objective training
-
摘要: 针对现有模型利用“历史因素-历史事故”关系替代“历史因素-未来事故”关系带来的评估精度降低问题,提出了一种面向交管实际的车辆未来事故风险分层评估方法。基于海量静态交管数据,围绕车辆属性、历史违法与历史事故等信息设计并确定表征指标,将历史静态指标与未来事故风险错位匹配,构建了错时序因果数据集;耦合交管部门事故风险预防预控多维多层需求,利用分目标训练方法,基于样本加权训练高、中、低精度子模型,并以逻辑规则集成输出了高、中、低、极低4级风险。分析结果表明:相较同时序建模,错时序显著提升识别性能;所设计融合模型针对5种车型均表现出优异的综合性能,各车型准确率与召回率分别为78.78%~93.80%与72.01%~93.98%,极低风险识别精确率大于97%,不同风险等级预测精准度梯度变化,满足不同事故风险车辆梯度识别的需求;模型具有较好的鲁棒性,迁徙用于不同年份和特定车型评估时,整体准确率为82.71%,整体召回率为93.67%,辨识效果虽会受到一定的影响,但最终辨识效果仍能较好地满足实际使用需求,尤其是模型对极低风险车辆的识别精准率保持在99.60%的高水平。可见,错时序数据集构建与分目标训练方法能够有效提升车辆未来事故风险识别性能,为交管部门开展分层分级的风险防控提供了可行技术路径。Abstract: In response to the problem of low assessment accuracy brought by the existing model using the relationship of "historical factors-historical accidents" to replace the relationship of "historical factors-future accidents", a hierarchical evaluation method was proposed for vehicle future accident risk oriented toward traffic management practice. Based on massive static traffic management data, characterization indicators were designed and determined around vehicle attributes, historical violations, and historical accidents. Historical static indicators were mismatched with future accident risk to construct a temporal misalignment causal dataset. Coupled with the multi-layer and multi-dimensional needs of accident risk prevention and control from traffic management department, a sub-objective training method was applied. Weighted samples were used to train high-, medium-, and low-accuracy sub-models. Logical rules were then integrated to output four risk levels: high, medium, low, and extremely low. Analysis results show that the temporal-misaligned modeling approach significantly improves the recognition performance compared to its temporal-aligned counterpart. The designed fusion model exhibits excellent overall performance across five vehicle types. The accuracy rate and recall rate for each vehicle type are 78.78% - 93.80% and 72.01% - 93.98%, respectively, while the precision of very-low-risk identification exceeds 97%. A graded change in prediction precision across risk levels meets the requirement for graded identification of vehicles with different accident risks. The model demonstrates good robustness. When migrated and applied to different years and specific vehicles, the model has an overall accuracy rate of 82.71% and an overall recall rate of 93.67%. Although recognition performance is somewhat affected, the final results still adequately meet practical requirements. In particular, the model's identification precision for extremely low-risk vehicles is maintained at a high level of 99.60%. Therefore, the construction of the temporal misalignment dataset and the sub-objective training method effectively improve the recognition performance of vehicle future accident risk, and provide a feasible technical path for traffic management department to carry out hierarchical and graded risk prevention and control.
-
表 1 车辆基本属性影响因素特征值处理
Table 1. Characteristic value processing of the influencing factors of basic vehicle attributes
属性 处理前 处理后 代码 车身颜色 A-白,B-灰色,C-黄,D-粉,E-红,F-紫,G-绿,H-蓝,I-棕,J-黑 0-白,1-黑,2-暖色,3-冷色,4-其他 CSYS 车龄 无 0-0~5年,1-5~10年,2-10~20年,3-20年以上 CL 机动车状态 A-正常,B-转出,Q逾期未检验,I-事故未处理,G-违法未处理,O-锁定,N-事故逃逸,M-达到报废标准 0-正常,1-违法未处理,2-事故未处理,3-逾期未检验或达到报废标准,4-其他 ZT 所有权 1-单位,2-个人 1-单位,2-个人 SYQ 是否新能源 1-是,2-否 1-是,2-否 SFXNY 使用性质 A-非营运,B-公路客运,C-公交客运,D-出租客运,E-旅游客运,F-货运,G-租赁,H-警用,O-幼儿校车,P-小学生校车 0-非营运,1-营运,2-其他 SYXZ 表 2 违法行为二元logistic回归结果
Table 2. Binary logistic regression results for violations
代码 违法行为 P值 代码 违法行为 P值 W1 未按规定使用灯光 0.000 W11 未缴纳罚款或接受其他处理 0.000 W2 驾驶不符合安全规范的车辆 0.000 W12 不按规定让道让车 0.000 W3 强行超车 0.000 W13 超速行驶 0.000 W4 无证驾驶 0.000 W14 酒驾或醉驾 0.000 W5 违反警告标志指示 0.000 W15 肇事逃逸 0.732 W6 未按规定使用喇叭 0.000 W16 逆向行驶 0.000 W7 超载 0.000 W17 疲劳驾驶 0.000 W8 不按规定会车 0.000 W18 人行道不减速 0.000 W9 其他不安全行为 0.000 W19 尾气不合格 0.584 W10 违法使用车道 0.000 表 3 车辆类型分类
Table 3. Classification of vehicle types
车辆类型 数量/veh 占比/% 非营运小轿车 2 137 243 79.52 营运小轿车 53 418 1.99 轻微型货车 310 592 11.56 中大型货车 131 559 4.89 中大型客车 54 846 2.04 表 4 2类数据集评估结果
Table 4. Evaluation results of two datasets
% 车辆类型 同时序数据集 错时序数据集 准确率 召回率 精确率 准确率 召回率 精确率 营运小轿车 83.66 75.77 64.16 95.12 88.33 86.12 非营运小轿车 95.16 13.55 62.38 93.16 75.33 76.15 中大型客车 96.37 30.87 57.76 96.77 80.70 79.56 轻微型货车 96.38 37.42 62.45 95.68 77.39 74.28 中大型货车 97.20 36.75 57.09 96.23 84.21 76.39 表 5 关键超参数取值
Table 5. Values of key hyperparameter
预测分类器 模型超参数 参数取值范围 数据集1取值 数据集2取值 决策树 max_depth {1, 3, 5, 7, 10} 7 7 min_samples_leaf {3, 5, 7, 9, 11} 9 9 criterion {gini, entroyp} gini gini splitter {best, random} random random XGBoost learning_rate {0.001, 0.01, 0.1, 0.2, 0.5} 0.5 0.5 max_depth {1, 3, 5, 7, 10} 7 7 n_estimators {50, 100, 150, 200} 100 100 min_child_weight {1, 2, 3, 4, 5} 5 5 LightGBM learning_rate {0.001, 0.01, 0.1, 0.2, 0.5} 0.2 0.2 num_inerations {50, 100, 200, 300} 100 100 num_leaves {10, 20, 30, 40, 50} 50 50 max_depth {1, 3, 5, 7, 10} 7 7 feature_fraction {0.6, 0.7, 0.8, 0.9, 1.0} 1.0 1.0 随机森林 n_estimators {50, 100, 200, 300} 100 100 max_depth {1, 3, 5, 10, 15} 10 10 min_samples_leaf {1, 2, 4, 6, 8} 6 6 min_samples_split {2, 5, 7, 9, 10} 9 9 表 6 模型评价指标比较
Table 6. Comparison of model evaluation indicators
模型 采样时间尺度/年 A/% R/% P/% F1 AP TP 决策树 1年 93.68 59.01 33.90 0.944 8 0.388 1 15 324 2年 95.17 40.52 40.29 0.95 17 0.387 9 10 521 XGBoost 1年 94.83 47.47 38.72 0.950 8 0.409 3 12 327 2年 94.76 48.40 38.33 0.950 5 0.411 2 12 568 LightGBM 1年 94.91 47.70 39.42 0.951 4 0.418 7 12 385 2年 94.76 49.83 38.64 0.950 8 0.423 1 12 939 随机森林 1年 95.28 41.09 41.63 0.952 7 0.407 0 10 670 2年 95.13 44.72 40.80 0.952 4 0.408 5 11 612 表 7 模型选择结果
Table 7. Model selection results
车型 最优模型 最优数据集 营运小轿车 XGBoost 数据集1 非营运小轿车 LightGBM 数据集2 中大型客车 XGBoost 数据集2 轻微型货车 XGBoost 数据集2 中大型货车 决策树 数据集2 表 8 不同事故样本权重下模型准确率、召回率和精确率
Table 8. Model accuracy, recall and precision with different accident samples weights
% 车辆类型 评价指标 事故数据训练权重 1∶1 3∶1 5∶1 7∶1 9∶1 营运小轿车 A 83.75 82.51 76.71 75.91 75.59 R 72.67 95.29 99.16 99.49 99.77 P 65.15 54.25 51.32 50.48 50.14 非营运小轿车 A 96.18 95.80 91.88 88.47 87.25 R 15.30 31.84 74.22 91.57 96.74 P 63.63 47.30 29.83 24.89 23.70 中大型客车 A 94.62 93.52 93.34 93.23 92.84 R 62.35 90.75 94.59 95.45 96.31 P 66.25 54.96 54.01 53.54 52.07 轻微型货车 A 97.16 95.80 94.76 93.84 93.19 R 38.32 75.52 86.10 93.87 96.24 P 65.08 43.76 38.46 35.26 33.19 中大型货车 A 97.14 96.24 94.79 94.57 93.76 R 51.72 79.52 91.69 93.50 97.37 P 53.76 43.89 36.30 35.51 32.76 表 9 不同车型的高、中、低精度子模型事故样本权重
Table 9. Accident sample weights for high, medium, and low accuracy submodels for different vehicle types
车辆类型 高精度子模型 中精度子模型 低精度子模型 营运小轿车 1∶1 3∶1 5∶1 非营运小轿车 1∶1 3∶1 5∶1 中大型客车 1∶1 3∶1 9∶1 轻微型货车 1∶1 3∶1 7∶1 中大型货车 1∶1 3∶1 5∶1 表 10 测试集样本概况
Table 10. Overview of test set samples
veh 车辆类型 数量 实际事故数 营运小轿车 16 026 3 934 非营运小轿车 641 133 25 967 中大型客车 16 454 651 轻微型货车 93 178 3 215 中大型货车 39 468 1 216 表 11 车辆事故风险分层融合识别模型整体结果
Table 11. Overall results of the hierarchical fusion recognition model for vehicle accident risks
车辆类型 风险等级 融合模型结果/% 最优单一模型/数据集 单一模型结果/% Pi Aa Ra A R P 营运小轿车 1 65.15 78.78 93.98 XGBoost/1年 83.75 72.67 65.15 2 35.28 3 22.03 4 99.61 90.76 非营运小轿车 1 63.63 91.72 72.01 LightGBM/2年 94.76 49.83 38.64 2 34.21 3 23.30 4 98.74 97.86 中大型客车 1 66.25 92.85 96.31 XGBoost/2年 94.62 62.35 66.25 2 40.00 3 28.06 4 99.67 96.85 轻微型货车 1 65.08 93.80 93.25 XGBoost/2年 96.04 72.78 45.36 2 34.57 3 16.67 4 99.74 99.01 中大型货车 1 53.76 93.80 90.13 决策树/2年 96.90 50.90 49.64 2 27.91 3 9.30 4 99.67 98.44 表 12 各车型重要变量对比
Table 12. Comparison of important variables for each vehicle type
车型 营运小轿车 非营运小轿车 中大型客车 轻微型货车 中大型货车 事故频率 ☆☆☆ ☆☆☆ ☆☆☆ ☆☆ ☆☆ 车龄 ☆☆ ☆ ☆☆ ☆ ☆ 机动车状态 ☆☆ ☆ ☆☆ ☆ ☆ 使用性质 ☆☆ 是否新能源 ☆☆ 所有权 ☆☆ 人行道不减速 ☆☆ ☆☆ 超载 ☆ 强行超车 ☆☆ 不规范驾驶 ☆☆ 违反警告标志指示 ☆☆ ☆ ☆ 车辆不符合规范 ☆☆ ☆ 注:☆☆☆为较强影响;☆☆为一般影响;☆为较弱影响。 表 13 模型鲁棒性验证数据
Table 13. Model robustness validation data
车辆类型 数量/veh 实际事故数/veh 预测变量 验证数据 危险货物货车 1 608 79 2021、2022年特征变量 2023年事故 表 14 模型鲁棒性评估结果
Table 14. Model robustness assessment results
数据集 风险等级 预测车辆数/veh 实际事故车辆数/veh Pi/% Aa/% Ra/% 原验证数据集 1 1 170 629 53.76 93.80 90.13 2 1 383 386 27.91 3 869 81 9.30 4 36 046 120 99.67 鲁棒性验证数据集 1 78 38 48.72 82.71 93.67 2 108 22 20.37 3 161 14 8.38 4 1 261 5 99.60 表 15 风险车辆分级管理
Table 15. Hierarchical management of risk vehicles
风险车辆数量/veh 事故风险概率/% 风险等级 管理措施 78 48.72 高事故风险 线下面谈,持续监督 108 20.37 中事故风险 电话约谈,增强安全意识 161 8.38 低事故风险 短信提示,提醒安全驾驶 1 261 0.40 极低事故风险 风险较低,不做干预,一年后重新评估 -
[1] SAMEEN M, PRADHAN B. Severity prediction of traffic accidents with recurrent neural networks[J]. Applied Sciences, 2017, 7(6): 476. doi: 10.3390/app7060476 [2] SHI X P, WONG Y D, LI M Z, et al. A feature learning approach based on XGBoost for driving assessment and risk prediction[J]. Accident Analysis & Prevention, 2019, 129: 170-179. [3] HAN L, YU R J, WANG C Z, et al. Transformer-based modeling of abnormal driving events for freeway crash risk evaluation[J]. Transportation Research Part C: Emerging Technologies, 2024, 165: 104727. doi: 10.1016/j.trc.2024.104727 [4] 张军, 胡震波, 朱新山. 基于AdaBoost分类器的实时交通事故预测[J]. 计算机应用, 2017, 37(1): 284-288.ZHANG Jun, HU Zhen-bo, ZHU Xin-shan. Real-time traffic accident prediction based on AdaBoost classifier[J]. Journal of Computer Applications, 2017, 37(1): 284-288. [5] YU L, DU B W, HU X, et al. Deep spatio-temporal graph convolutional network for traffic accident prediction[J]. Neurocomputing, 2021, 423: 135-147. doi: 10.1016/j.neucom.2020.09.043 [6] 戢晓峰, 谢世坤, 覃文文, 等. 基于轨迹数据的山区危险性弯道路段交通事故风险动态预测[J]. 中国公路学报, 2022, 35(4): 277-285.JI Xiao-feng, XIE Shi-kun, QIN Wen-wen, et al. Dynamic prediction of traffic accident risk in risky curve sections based on vehicle trajectory data[J]. China Journal of Highway and Transport, 2022, 35(4): 277-285. [7] 王海星, 王翔宇, 王招贤, 等. 基于数据挖掘的危险货物运输风险驾驶行为聚类分析[J]. 交通运输系统工程与信息, 2020, 20(1): 183-189.WANG Hai-xing, WANG Xiang-yu, WANG Zhao-xian, et al. Dangerous driving behavior clustering analysis for hazardous materials transportation based on data mining[J]. Journal of Transportation Systems Engineering and Information Technology, 2020, 20(1): 183-189. [8] 吕能超, 高谨谨, 王维锋, 等. 网联环境下基于精简车头时距特性的驾驶风格分类[J]. 交通信息与安全, 2022, 40(1): 116-125, 168.LYU Neng-chao, GAO Jin-jin, WANG Wei-feng, et al. Classification of driving style using simplified features of headway under the connected vehicles environment[J]. Journal of Transport Information and Safety, 2022, 40(1): 116-125, 168. [9] CHEN J S, ZHANG Q X, CHEN J X, et al. A driving risk assessment framework considering driver's fatigue state and distraction behavior[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(12): 20120-20136. doi: 10.1109/TITS.2024.3446832 [10] 徐婷, 张香, 张亚坤, 等. 基于AdaBoost算法的货车驾驶人安全倾向性分类[J]. 安全与环境学报, 2019, 19(4): 1273-1281.XU Ting, ZHANG Xiang, ZHANG Ya-kun, et al. Truck driver safety tendency classification based on the AdaBoost algorithm[J]. Journal of Safety and Environment, 2019, 19(4): 1273-1281. [11] GUO M, ZHAO X H, YAO Y, et al. A study of freeway crash risk prediction and interpretation based on risky driving behavior and traffic flow data[J]. Accident Analysis & Prevention, 2021, 160: 106328. [12] 李雪, 赵宁, 郑晅. 基于云模型的道路交通安全等级评价[J]. 北京工业大学学报, 2015, 41(8): 1219-1224.LI Xue, ZHAO Ning, ZHENG Xuan. Road traffic safety evaluation based on cloud mode[J]. Journal of Beijing University of Technology, 2015, 41(8): 1219-1224. [13] ELVIK R, ULSTEIN H, WIFSTAD K, et al. An empirical Bayes before-after evaluation of road safety effects of a new motorway in Norway[J]. Accident Analysis & Prevention, 2017, 108: 285-296. [14] 陈昭明, 徐文远. 基于负二项分布的高速公路交通事故影响因素分析[J]. 交通信息与安全, 2022, 40(1): 28-35.CHEN Zhao-ming, XU Wen-yuan. An analysis of factors influencing freeway crashes with a negative binomial model[J]. Journal of Transport Information and Safety, 2022, 40(1): 28-35. [15] GIROTTO E, DE ANDRADE S M, GONZáLEZ A D, et al. Professional experience and traffic accidents/near-miss accidents among truck drivers[J]. Accident Analysis & Prevention, 2016, 95: 299-304. [16] 吕能超, 王玉刚, 周颖, 等. 道路交通安全分析与评价方法综述[J]. 中国公路学报, 2023, 36(4): 183-201.LYU Neng-chao, WANG Yu-gang, ZHOU Ying, et al. Review of road traffic safety analysis and evaluation methods[J]. China Journal of Highway and Transport, 2023, 36(4): 183-201. [17] 程坦. 道路交通事故数据挖掘及应用研究[D]. 哈尔滨: 哈尔滨工业大学, 2009.CHENG Tan. Research on road traffic accident data mining and its application[D]. Harbin: Harbin Institute of Tech-nology, 2009. [18] MA X L, XING Y Y, LU J. Causation analysis of hazardous material road transportation accidents by Bayesian network using genie[J]. Journal of Advanced Transportation, 2018, 2018(1): 6248105. [19] VERAN T, PORTIER P E, FOUQUET F. Interpretable hierarchical symbolic regression for safety-critical systems with an application to highway crash prediction[J]. Engi-neering Applications of Artificial Intelligence, 2023, 117: 105534. doi: 10.1016/j.engappai.2022.105534 [20] YANG B, ZHANG L N. Prediction method of motor vehicle traffic accident based on support vector machine[J]. Applied Mechanics and Materials, 2014, 631/632: 284-287. doi: 10.4028/www.scientific.net/AMM.631-632.284 [21] BAGLOEE S A, ASADI M. Crash analysis at intersections in the CBD: A survival analysis model[J]. Transportation Research Part A: Policy and Practice, 2016, 94: 558-572. doi: 10.1016/j.tra.2016.10.019 [22] 张国强. 基于机器学习的交通事故预测系统设计与实现[D]. 哈尔滨: 黑龙江大学, 2019.ZHANG Guo-qiang. Design and implementation of traffic accident prediction system based on machine learning[D]. Harbin: Heilongjiang University, 2019. [23] 许卉莹, 裘晨璐. 车身颜色与事故风险度相关性模型研究[J]. 中国公共安全(学术版), 2015(2): 59-63.XU Hui-ying, QIU Chen-lu. Correlation model of vehicle color and risk level of accident[J]. China Public Security (Academy Edition), 2015(2): 59-63. [24] 李伊婧. 我国道路交通事故影响因素分析及严重程度预测研究[D]. 长春: 吉林大学, 2020.LI Yi-jing. Analysis of influencing factors and prediction of severity of road traffic accidents in China[D]. Changchun: Jilin University, 2020. [25] 张晨佳, 庞松岭, 张璐璐. 电动汽车动力电池安全风险评估研究[J]. 环境技术, 2023, 41(12): 18-23.ZHANG Chen-jia, PANG Song-ling, ZHANG Lu-lu. Research on safety risk assessment of electric vehicle power batteries[J]. Environmental Technology, 2023, 41(12): 18-23. [26] 张旭欣, 王雪松, 马勇, 等. 驾驶行为与驾驶风险国际研究进展[J]. 中国公路学报, 2020, 33(6): 1-17.ZHANG Xu-xin, WANG Xue-song, MA Yong, et al. International research progress on driving behavior and driving risks[J]. China Journal of Highway and Transport, 2020, 33(6): 1-17. [27] DU Z J, DENG M, LYU N C, et al. A review of road safety evaluation methods based on driving behavior[J]. Journal of Traffic and Transportation Engineering (English Edition), 2023, 10(5): 743-761. doi: 10.1016/j.jtte.2023.07.005 [28] 张驰, 周郁茗, 翟艺阳, 等. 公路事故多发路段辨识方法研究综述[J]. 长安大学学报(自然科学版), 2023, 43(5): 72-87.ZHANG Chi, ZHOU Yu-ming, ZHAI Yi-yang, et al. Over-view of identification methods of highway accident-prone sections[J]. Journal of Chang'an University (Natural Science Edition), 2023, 43(5): 72-87. [29] OLAYODE I O, DU B, SEVERINO A, et al. Systematic literature review on the applications, impacts, and public perceptions of autonomous vehicles in road transportation system[J]. Journal of Traffic and Transportation Engi-neering (English Edition), 2023, 10(6): 1037-1060. doi: 10.1016/j.jtte.2023.07.006 [30] BASSO F, BASSO L J, PEZOA R. The importance of flow composition in real-time crash prediction[J]. Accident Analysis & Prevention, 2020, 137: 105436. [31] 张敏, 刘锴, 张驰, 等. 基于高频GPS数据的互通立交匝道货车运行速度预测模型[J]. 交通运输工程学报, 2024, 24(4): 228-242. doi: 10.19818/j.cnki.1671-1637.2024.04.017 ZHANG Min, LIU Kai, ZHANG Chi, et al. Operating speed prediction models of trucks at interchange ramps based on high-frequency GPS data[J]. Journal of Traffic and Transportation Engineering, 2024, 24(4): 228-242. doi: 10.19818/j.cnki.1671-1637.2024.04.017 [32] KOTSIANTIS S B. Decision trees: A recent overview[J]. Artificial Intelligence Review, 2013, 39(4): 261-283. doi: 10.1007/s10462-011-9272-4 [33] CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]// ACM. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794. [34] KE G, MENG Q, FINLEY T, et al. LightGBM: A highly efficient gradient boosting decision tree[C]//GUYON I, VON LUXBURG U, BENGIO S, et al. Advances in Neural Information Processing Systems 30. Long Beach: Curran Associates, 2017: 3149-3157. [35] BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. doi: 10.1023/A:1010933404324 -
下载: