Interpolation method of traffic volume missing data based on improved low-rank matrix completion
-
摘要: 提出了一种低秩矩阵补全的改进方法以研究道路交通量数据缺失值插补问题。应用基于核范数的低秩矩阵补全对交通量数据矩阵中的缺失值进行第1轮插补; 通过层次聚类算法将交通量数据划分为不同类别, 使得同类中的数据具有较强相关性, 异类中的数据具有较弱的相关性; 在每类样本上应用低秩矩阵补全得到缺失值的第2轮插补; 为了减少聚类数的影响, 提出最小二乘回归集成学习方法将不同聚类数下的插补结果进行融合, 得到最终的交通量数据插补结果; 用美国俄勒冈州波特兰市的交通量数据比较了5种方法的插补误差, 并分析了不同聚类数和距离度量方法的影响。研究结果表明: 在完全随机缺失模式下, 缺失率为10%~60%时, 其相对于传统的低秩矩阵补全模型的插补误差降低了5.93%~9.11%;在随机缺失和混合缺失模式下, 插补误差也分别降低了8.32%~9.55%和8.14%~9.20%;集成不同聚类数下的多个插补结果比单一聚类数下的插补误差降低2.62%~4.76%。可见, 在3种数据缺失模式下, 改进低秩矩阵补全方法降低了交通量数据的插补误差, 能有效提高插补后交通量数据的有效性。Abstract: An improved low-rank matrix completion method was proposed to study the interpolation problem of road traffic volume missing data. The missing data in the traffic volume data matrix were interpolated in the first round by the low-rank matrix completion based on the nuclear norm. Hierarchical clustering algorithm was applied to classify traffic volume data into different clusters so that the data in the same cluster had strong correlation, while the data in different clusters had weak correlation. Low-rank matrix completion method was applied to each cluster to complete the second round interpolation for missing data. In order to reduce the impact of clustering number, the least square regression ensemble learning approach was proposed to combine the interpolation results under different clustering numbers, so as to obtain the final traffic volume data interpolation results. The interpolation errors of five methods were compared based on the highway traffic volume data in Portland, Oregon, USA, and the influences of different clustering numbers and distance metrics methods were analyzed. Analysis result shows that under the completely random missing pattern, when the missing rate is 10%-60%, the interpolation error reduces by 5.93%-9.11% compared with the traditional low-rank matrix completion model. Under the random and mixed missing patterns, the interpolation errors reduce by 8.32%-9.55% and 8.14%-9.20%, respectively. The integration of multiple interpolations under different clustering numbers can reduce the interpolation error by 2.62%-4.76% compared with the results under single clustering number. Therefore, under three data missing modes, the improved low-rank matrix completion method can reduce the interpolation error of traffic volume data effectively, and improve the effectiveness of traffic volume data after interpolation.
-
表 1 MCAR模式下不同方法的插补误差
Table 1. Interpolation errors of different methods under MCAR pattern
缺失率/% 误差 MI SVD PPCA LRMC LLRMC-EN 平均值 方差 平均值 方差 平均值 方差 平均值 方差 平均值 方差 10 E1 25.50 0.16 13.71 0.11 11.45 0.06 12.04 0.07 11.19 0.08 E2 19.34 0.14 10.31 0.09 8.62 0.04 9.07 0.04 8.38 0.05 20 E1 25.43 0.04 14.94 0.05 11.71 0.05 12.42 0.06 11.49 0.08 E2 19.30 0.02 11.24 0.03 8.79 0.04 9.34 0.03 8.59 0.04 30 E1 25.36 0.06 16.26 0.07 12.05 0.06 12.89 0.06 11.86 0.06 E2 19.23 0.02 12.24 0.04 9.05 0.04 9.68 0.02 8.86 0.04 40 E1 25.36 0.03 17.75 0.06 12.57 0.07 13.51 0.06 12.40 0.08 E2 19.21 0.02 13.35 0.05 9.43 0.05 10.13 0.04 9.24 0.05 50 E1 25.36 0.03 19.31 0.09 13.28 0.05 14.32 0.06 13.13 0.05 E2 19.21 0.01 14.52 0.06 9.95 0.03 10.71 0.05 9.76 0.04 60 E1 25.39 0.01 20.93 0.08 14.45 0.04 15.44 0.06 14.18 0.12 E2 19.21 0.00 15.71 0.06 10.78 0.03 11.52 0.05 10.47 0.08 表 2 MAR模式下不同方法的插补误差
Table 2. Interpolation errors of different methods under MAR pattern
表 3 混合缺失模式下不同方法的插补误差
Table 3. Interpolation errors of different methods under mixed missing pattern
-
[1] LI Lin-chao, ZHANG Jian, WANG Yong-gang, et al. Missing value imputation for traffic-related time series data based on a multi-view learning method[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(8): 2933-2943. doi: 10.1109/TITS.2018.2869768 [2] WANG Hai, DAI Lei, CAI Ying-feng, et al. Salient object detection based on multi-scale contrast[J]. Neural Networks, 2018, 101: 47-56. doi: 10.1016/j.neunet.2018.02.005 [3] LYU Yi-sheng, DUAN Yan-jie, KANG Wen-wen, et al. Traffic flow prediction with big data: a deep learning approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(2): 865-873. [4] 许岩岩, 翟希, 孔庆杰, 等. 高速路交通流短时预测方法[J]. 交通运输工程学报, 2013, 13(2): 114-119. doi: 10.3969/j.issn.1671-1637.2013.02.017XU Yan-yan, ZHAI Xi, KONG Qing-jie, et al. Short-term prediction method of freeway traffic flow[J]. Journal of Traffic and Transportation Engineering, 2013, 13(2): 114-119. (in Chinese). doi: 10.3969/j.issn.1671-1637.2013.02.017 [5] 陈小波, 刘祥, 韦中杰, 等. 基于GA-LSSVR模型的路网短时交通流预测研究[J]. 交通运输系统工程与信息, 2017, 17(1): 60-66, 81. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201701011.htmCHEN Xiao-bo, LIU Xiang, WEI Zhong-jie, et al. Short-term traffic flow forecasting of road network based on GA-LSSVR model[J]. Journal of Transportation Systems Engineering and Information Technology, 2017, 17(1): 60-66, 81. (in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201701011.htm [6] 雷定猷, 马强, 徐新平, 等. 基于非线性主成分分析和GA-RBF的高速公路交通量预测方法[J]. 交通运输工程学报, 2018, 18(3): 210-217. doi: 10.3969/j.issn.1671-1637.2018.03.022LEI Ding-you, MA Qiang, XU Xin-ping, et al. Forecasting method of expressway traffic volume based on NPCA and GA-RBF[J]. Journal of Traffic and Transportation Engineering, 2018, 18(3): 210-217. (in Chinese). doi: 10.3969/j.issn.1671-1637.2018.03.022 [7] ZHANG Jun-ping, WANG Fei-yue, WANG Kun-feng, et al. Data-driven intelligent transportation systems: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(4): 1624-1639. doi: 10.1109/TITS.2011.2158001 [8] 孟鸿程, 陈淑燕. 交通流缺失数据处理方法比较分析[J]. 交通信息与安全, 2018, 36(2): 61-67. doi: 10.3963/j.issn.1674-4861.2018.02.009MENG Hong-cheng, CHEN Shu-yan. A comparative analysis of data imputation methods for missing traffic flow data[J]. Journal of Transport Information and Safety, 2018, 36(2): 61-67. (in Chinese). doi: 10.3963/j.issn.1674-4861.2018.02.009 [9] 李林超, 曲栩, 张健, 等. 基于特征级融合的高速公路异质交通流数据修复方法[J]. 东南大学学报(自然科学版), 2018, 48(5): 972-978. https://www.cnki.com.cn/Article/CJFDTOTAL-DNDX201805029.htmLI Lin-chao, QU Xu, ZHANG Jian, et al. Missing value imputation method for heterogeneous traffic flow data based on feature fusion[J]. Journal of Southeast University (Natural Science Edition), 2018, 48(5): 972-978. (in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-DNDX201805029.htm [10] 陆化普, 孙智源, 屈闻聪. 基于时空模型的交通流故障数据修正方法[J]. 交通运输工程学报, 2015, 15(6): 92-100, 117. doi: 10.3969/j.issn.1671-1637.2015.06.012LU Hua-pu, SUN Zhi-yuan, QU Wen-cong. Repair method of traffic flow malfunction data based on temporal-spatial model[J]. Journal of Traffic and Transportation Engineering, 2015, 15(6): 92-100, 117. (in Chinese). doi: 10.3969/j.issn.1671-1637.2015.06.012 [11] 孙玲, 刘浩, 牛树云. 考虑时空相关性的固定检测缺失数据重构算法[J]. 交通运输工程学报, 2010, 10(5): 121-126. doi: 10.3969/j.issn.1671-1637.2010.05.021SUN Ling, LIU Hao, NIU Shu-yun. Reconstructive method of missing data for location-specific detector considering spatio-temporal relationship[J]. Journal of Traffic and Transportation Engineering, 2010, 10(5): 121-126. (in Chinese). doi: 10.3969/j.issn.1671-1637.2010.05.021 [12] CHEN Xiao-bo, CHEN Cheng, CAI Ying-feng, et al. Kernel sparse representation with hybrid regularization for on-road traffic sensor data imputation[J]. Sensors, 2018, 18: 1-20. doi: 10.1109/JSEN.2018.2870221 [13] CHEN Xiao-bo, CAI Ying-feng, YE Qiao-lin, et al. Graph regularized local self-representation for missing value imputation with applications to on-road traffic sensor data[J]. Neurocomputing, 2018, 303: 47-59. doi: 10.1016/j.neucom.2018.04.029 [14] LI Yue-biao, LI Zhi-heng, LI Li. Missing traffic data: comparison of imputation methods[J]. IET Intelligent Transport Systems, 2014, 8(1): 51-57. doi: 10.1049/iet-its.2013.0052 [15] QU Li, LI Li, ZHANG Yi, et al. PPCA-based missing data imputation for traffic flow volume: a systematical approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2009, 10(3): 512-522. doi: 10.1109/TITS.2009.2026312 [16] 钱超, 陈建勋, 罗彦斌, 等. 基于随机森林的公路隧道运营缺失数据插补方法[J]. 交通运输系统工程与信息, 2016, 16(3): 81-87. doi: 10.3969/j.issn.1009-6744.2016.03.012QIAN Chao, CHEN Jian-xun, LUO Yan-bin, et al. Random forest based operational missing data imputation for highway tunnel[J]. Journal of Transportation Engineering and Information Technology, 2016, 16(3): 81-87. (in Chinese). doi: 10.3969/j.issn.1009-6744.2016.03.012 [17] AHN J, KO E, KIM E Y. Highway traffic flow prediction using support vector regression and Bayesian classifier[C]//IEEE. 2016 International Conference on Big Data and Smart Computing. New York: IEEE, 2016: 239-244. [18] ASIF M T, MITROVIC N, GARG L, et al. Low-dimensional models for missing data imputation in road networks[C]//IEEE. The 38th IEEE International Conference on Acoustics, Speech, and Signal Processing. New York: IEEE, 2013: 3527-3531. [19] TAN Hua-chun, FENG Guang-dong, FENG Jian-shuai, et al. A tensor-based method for missing traffic data completion[J]. Transportation Research Part C: Emerging Technologies, 2013, 28: 15-27. doi: 10.1016/j.trc.2012.12.007 [20] CHEN Xin-yu, HE Zhao-cheng, CHEN Yi-xian, et al. Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model[J]. Transportation Research Part C: Emerging Technologies, 2019, 104: 66-77. doi: 10.1016/j.trc.2019.03.003 [21] 刘园园. 快速低秩矩阵与张量恢复的算法研究[D]. 西安: 西安电子科技大学, 2013.LIU Yuan-yuan. Algorithm research of fast low-rank matrix and tensor recovery[D]. Xi'an: Xidian University, 2013. (in Chinese). [22] CANDÈS E J, RECHT B. Exact matrix completion via convex optimization[J]. Foundations of Computational Mathematics, 2009, 9(6): 717-772. doi: 10.1007/s10208-009-9045-5 [23] CAI Jian-feng, CANDÈS E J, SHEN Zuo-wei. A singular value thresholding algorithm for matrix completion[J]. SIAM Journal on Optimization, 2010, 20(4): 1956-1982. doi: 10.1137/080738970 [24] HASTIE T, MAZUMDER R, LEE J D, et al. Matrix completion and low-rank SVD via fast alternating least squares[J]. Journal of Machine Learning Research, 2015, 16: 3367-3402. [25] MA Shi-qian, GOLDFARB D, CHEN Li-feng. Fixed point and Bregman iterative methods for matrix rank minimization[J]. Mathematical Programming, 2009, 128(1/2): 321-353. [26] CHEN Xiao-bo, XIAO Yan, CAI Yin-feng, et al. Structural max-margin discriminant analysis for feature extraction[J]. Knowledge-Based Systems, 2014, 70: 154-166. doi: 10.1016/j.knosys.2014.06.020 [27] MURTAGH F, LEGENDRE P. Ward's hierarchical agglomerative clustering method: which algorithms implement ward's criterion?[J]. Journal of Classification, 2014, 31: 274-295. [28] FARHANGFAR A, KURGAN L A, PEDRYCZ W. A novel framework for imputation of missing values in databases[J]. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 2007, 37(5): 692-709. [29] CHAI T, DRAXLER R R. Root mean square error (RMSE) or mean absolute error (MAE) —Arguments against avoiding RMSE in the literature[J]. Geoscientific Model Development, 2014, 7: 1247-1250. [30] ZHENG Zu-duo, SU Dong-cai, et al. Short-term traffic volume forecasting: a k-nearest neighbor approach enhanced by constrained linearly sewing principle component algorithm[J]. Transportation Research Part C: Emerging Technologies, 2014, 43: 143-157. -