留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于改进低秩矩阵补全的交通量数据缺失值插补方法

陈小波 陈程 陈蕾 韦中杰 蔡英凤 周俊杰

陈小波, 陈程, 陈蕾, 韦中杰, 蔡英凤, 周俊杰. 基于改进低秩矩阵补全的交通量数据缺失值插补方法[J]. 交通运输工程学报, 2019, 19(5): 180-190. doi: 10.19818/j.cnki.1671-1637.2019.05.018
引用本文: 陈小波, 陈程, 陈蕾, 韦中杰, 蔡英凤, 周俊杰. 基于改进低秩矩阵补全的交通量数据缺失值插补方法[J]. 交通运输工程学报, 2019, 19(5): 180-190. doi: 10.19818/j.cnki.1671-1637.2019.05.018
CHEN Xiao-bo, CHEN Cheng, CHEN Lei, WEI Zhong-jie, CAI Ying-feng, ZHOU Jun-jie. Interpolation method of traffic volume missing data based on improved low-rank matrix completion[J]. Journal of Traffic and Transportation Engineering, 2019, 19(5): 180-190. doi: 10.19818/j.cnki.1671-1637.2019.05.018
Citation: CHEN Xiao-bo, CHEN Cheng, CHEN Lei, WEI Zhong-jie, CAI Ying-feng, ZHOU Jun-jie. Interpolation method of traffic volume missing data based on improved low-rank matrix completion[J]. Journal of Traffic and Transportation Engineering, 2019, 19(5): 180-190. doi: 10.19818/j.cnki.1671-1637.2019.05.018

基于改进低秩矩阵补全的交通量数据缺失值插补方法

doi: 10.19818/j.cnki.1671-1637.2019.05.018
基金项目: 

国家自然科学基金项目 61773184

国家自然科学基金项目 51875255

国家自然科学基金项目 61572241

江苏省大数据安全与智能处理重点实验室开放课题 BDSIP1802

详细信息
    作者简介:

    陈小波(1982-), 男, 湖北仙桃人, 江苏大学副教授, 工学博士, 从事机器学习与智能交通研究

  • 中图分类号: U491.1

Interpolation method of traffic volume missing data based on improved low-rank matrix completion

More Information
Article Text (Baidu Translation)
  • 摘要: 提出了一种低秩矩阵补全的改进方法以研究道路交通量数据缺失值插补问题。应用基于核范数的低秩矩阵补全对交通量数据矩阵中的缺失值进行第1轮插补; 通过层次聚类算法将交通量数据划分为不同类别, 使得同类中的数据具有较强相关性, 异类中的数据具有较弱的相关性; 在每类样本上应用低秩矩阵补全得到缺失值的第2轮插补; 为了减少聚类数的影响, 提出最小二乘回归集成学习方法将不同聚类数下的插补结果进行融合, 得到最终的交通量数据插补结果; 用美国俄勒冈州波特兰市的交通量数据比较了5种方法的插补误差, 并分析了不同聚类数和距离度量方法的影响。研究结果表明: 在完全随机缺失模式下, 缺失率为10%~60%时, 其相对于传统的低秩矩阵补全模型的插补误差降低了5.93%~9.11%;在随机缺失和混合缺失模式下, 插补误差也分别降低了8.32%~9.55%和8.14%~9.20%;集成不同聚类数下的多个插补结果比单一聚类数下的插补误差降低2.62%~4.76%。可见, 在3种数据缺失模式下, 改进低秩矩阵补全方法降低了交通量数据的插补误差, 能有效提高插补后交通量数据的有效性。

     

  • 图  1  局部低秩矩阵补全集成学习方法

    Figure  1.  Ensemble learning method of local low-rank matrix completion

    图  2  凝聚型层次聚类算法

    Figure  2.  Cohesive hierarchical clustering algorithm

    图  3  检测站分布

    Figure  3.  Distribution of detection stations

    图  4  奇异值变化过程

    Figure  4.  Variation process of singular value

    图  5  最相关和最不相关交通样本

    Figure  5.  Most relevant and least relevant traffic samples

    图  6  3种缺失模式

    Figure  6.  Three missing patterns

    图  7  缺失率为10%时, MCAR和MAR模式下E1k变化曲线

    Figure  7.  Changing curves of E1 with k under MCAR and MAR patterns when missing rate is 10%

    图  8  缺失率60%为时, MCAR和MAR模式下E1k变化曲线

    Figure  8.  Changing curves of E1 with k under MCAR and MAR patterns when missing rate is 60%

    图  9  MCAR模式下均方根误差

    Figure  9.  RMSEs under MCAR pattern

    图  10  MAR模式下均方根误差

    Figure  10.  RMSEs under MAR pattern

    图  11  混合缺失模式下均方根误差

    Figure  11.  RMSEs under mixed missing pattern

    表  1  MCAR模式下不同方法的插补误差

    Table  1.   Interpolation errors of different methods under MCAR pattern

    缺失率/% 误差 MI SVD PPCA LRMC LLRMC-EN
    平均值 方差 平均值 方差 平均值 方差 平均值 方差 平均值 方差
    10 E1 25.50 0.16 13.71 0.11 11.45 0.06 12.04 0.07 11.19 0.08
    E2 19.34 0.14 10.31 0.09 8.62 0.04 9.07 0.04 8.38 0.05
    20 E1 25.43 0.04 14.94 0.05 11.71 0.05 12.42 0.06 11.49 0.08
    E2 19.30 0.02 11.24 0.03 8.79 0.04 9.34 0.03 8.59 0.04
    30 E1 25.36 0.06 16.26 0.07 12.05 0.06 12.89 0.06 11.86 0.06
    E2 19.23 0.02 12.24 0.04 9.05 0.04 9.68 0.02 8.86 0.04
    40 E1 25.36 0.03 17.75 0.06 12.57 0.07 13.51 0.06 12.40 0.08
    E2 19.21 0.02 13.35 0.05 9.43 0.05 10.13 0.04 9.24 0.05
    50 E1 25.36 0.03 19.31 0.09 13.28 0.05 14.32 0.06 13.13 0.05
    E2 19.21 0.01 14.52 0.06 9.95 0.03 10.71 0.05 9.76 0.04
    60 E1 25.39 0.01 20.93 0.08 14.45 0.04 15.44 0.06 14.18 0.12
    E2 19.21 0.00 15.71 0.06 10.78 0.03 11.52 0.05 10.47 0.08
    下载: 导出CSV

    表  2  MAR模式下不同方法的插补误差

    Table  2.   Interpolation errors of different methods under MAR pattern

    下载: 导出CSV

    表  3  混合缺失模式下不同方法的插补误差

    Table  3.   Interpolation errors of different methods under mixed missing pattern

    下载: 导出CSV
  • [1] LI Lin-chao, ZHANG Jian, WANG Yong-gang, et al. Missing value imputation for traffic-related time series data based on a multi-view learning method[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(8): 2933-2943. doi: 10.1109/TITS.2018.2869768
    [2] WANG Hai, DAI Lei, CAI Ying-feng, et al. Salient object detection based on multi-scale contrast[J]. Neural Networks, 2018, 101: 47-56. doi: 10.1016/j.neunet.2018.02.005
    [3] LYU Yi-sheng, DUAN Yan-jie, KANG Wen-wen, et al. Traffic flow prediction with big data: a deep learning approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(2): 865-873.
    [4] 许岩岩, 翟希, 孔庆杰, 等. 高速路交通流短时预测方法[J]. 交通运输工程学报, 2013, 13(2): 114-119. doi: 10.3969/j.issn.1671-1637.2013.02.017

    XU Yan-yan, ZHAI Xi, KONG Qing-jie, et al. Short-term prediction method of freeway traffic flow[J]. Journal of Traffic and Transportation Engineering, 2013, 13(2): 114-119. (in Chinese). doi: 10.3969/j.issn.1671-1637.2013.02.017
    [5] 陈小波, 刘祥, 韦中杰, 等. 基于GA-LSSVR模型的路网短时交通流预测研究[J]. 交通运输系统工程与信息, 2017, 17(1): 60-66, 81. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201701011.htm

    CHEN Xiao-bo, LIU Xiang, WEI Zhong-jie, et al. Short-term traffic flow forecasting of road network based on GA-LSSVR model[J]. Journal of Transportation Systems Engineering and Information Technology, 2017, 17(1): 60-66, 81. (in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201701011.htm
    [6] 雷定猷, 马强, 徐新平, 等. 基于非线性主成分分析和GA-RBF的高速公路交通量预测方法[J]. 交通运输工程学报, 2018, 18(3): 210-217. doi: 10.3969/j.issn.1671-1637.2018.03.022

    LEI Ding-you, MA Qiang, XU Xin-ping, et al. Forecasting method of expressway traffic volume based on NPCA and GA-RBF[J]. Journal of Traffic and Transportation Engineering, 2018, 18(3): 210-217. (in Chinese). doi: 10.3969/j.issn.1671-1637.2018.03.022
    [7] ZHANG Jun-ping, WANG Fei-yue, WANG Kun-feng, et al. Data-driven intelligent transportation systems: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(4): 1624-1639. doi: 10.1109/TITS.2011.2158001
    [8] 孟鸿程, 陈淑燕. 交通流缺失数据处理方法比较分析[J]. 交通信息与安全, 2018, 36(2): 61-67. doi: 10.3963/j.issn.1674-4861.2018.02.009

    MENG Hong-cheng, CHEN Shu-yan. A comparative analysis of data imputation methods for missing traffic flow data[J]. Journal of Transport Information and Safety, 2018, 36(2): 61-67. (in Chinese). doi: 10.3963/j.issn.1674-4861.2018.02.009
    [9] 李林超, 曲栩, 张健, 等. 基于特征级融合的高速公路异质交通流数据修复方法[J]. 东南大学学报(自然科学版), 2018, 48(5): 972-978. https://www.cnki.com.cn/Article/CJFDTOTAL-DNDX201805029.htm

    LI Lin-chao, QU Xu, ZHANG Jian, et al. Missing value imputation method for heterogeneous traffic flow data based on feature fusion[J]. Journal of Southeast University (Natural Science Edition), 2018, 48(5): 972-978. (in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-DNDX201805029.htm
    [10] 陆化普, 孙智源, 屈闻聪. 基于时空模型的交通流故障数据修正方法[J]. 交通运输工程学报, 2015, 15(6): 92-100, 117. doi: 10.3969/j.issn.1671-1637.2015.06.012

    LU Hua-pu, SUN Zhi-yuan, QU Wen-cong. Repair method of traffic flow malfunction data based on temporal-spatial model[J]. Journal of Traffic and Transportation Engineering, 2015, 15(6): 92-100, 117. (in Chinese). doi: 10.3969/j.issn.1671-1637.2015.06.012
    [11] 孙玲, 刘浩, 牛树云. 考虑时空相关性的固定检测缺失数据重构算法[J]. 交通运输工程学报, 2010, 10(5): 121-126. doi: 10.3969/j.issn.1671-1637.2010.05.021

    SUN Ling, LIU Hao, NIU Shu-yun. Reconstructive method of missing data for location-specific detector considering spatio-temporal relationship[J]. Journal of Traffic and Transportation Engineering, 2010, 10(5): 121-126. (in Chinese). doi: 10.3969/j.issn.1671-1637.2010.05.021
    [12] CHEN Xiao-bo, CHEN Cheng, CAI Ying-feng, et al. Kernel sparse representation with hybrid regularization for on-road traffic sensor data imputation[J]. Sensors, 2018, 18: 1-20. doi: 10.1109/JSEN.2018.2870221
    [13] CHEN Xiao-bo, CAI Ying-feng, YE Qiao-lin, et al. Graph regularized local self-representation for missing value imputation with applications to on-road traffic sensor data[J]. Neurocomputing, 2018, 303: 47-59. doi: 10.1016/j.neucom.2018.04.029
    [14] LI Yue-biao, LI Zhi-heng, LI Li. Missing traffic data: comparison of imputation methods[J]. IET Intelligent Transport Systems, 2014, 8(1): 51-57. doi: 10.1049/iet-its.2013.0052
    [15] QU Li, LI Li, ZHANG Yi, et al. PPCA-based missing data imputation for traffic flow volume: a systematical approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2009, 10(3): 512-522. doi: 10.1109/TITS.2009.2026312
    [16] 钱超, 陈建勋, 罗彦斌, 等. 基于随机森林的公路隧道运营缺失数据插补方法[J]. 交通运输系统工程与信息, 2016, 16(3): 81-87. doi: 10.3969/j.issn.1009-6744.2016.03.012

    QIAN Chao, CHEN Jian-xun, LUO Yan-bin, et al. Random forest based operational missing data imputation for highway tunnel[J]. Journal of Transportation Engineering and Information Technology, 2016, 16(3): 81-87. (in Chinese). doi: 10.3969/j.issn.1009-6744.2016.03.012
    [17] AHN J, KO E, KIM E Y. Highway traffic flow prediction using support vector regression and Bayesian classifier[C]//IEEE. 2016 International Conference on Big Data and Smart Computing. New York: IEEE, 2016: 239-244.
    [18] ASIF M T, MITROVIC N, GARG L, et al. Low-dimensional models for missing data imputation in road networks[C]//IEEE. The 38th IEEE International Conference on Acoustics, Speech, and Signal Processing. New York: IEEE, 2013: 3527-3531.
    [19] TAN Hua-chun, FENG Guang-dong, FENG Jian-shuai, et al. A tensor-based method for missing traffic data completion[J]. Transportation Research Part C: Emerging Technologies, 2013, 28: 15-27. doi: 10.1016/j.trc.2012.12.007
    [20] CHEN Xin-yu, HE Zhao-cheng, CHEN Yi-xian, et al. Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model[J]. Transportation Research Part C: Emerging Technologies, 2019, 104: 66-77. doi: 10.1016/j.trc.2019.03.003
    [21] 刘园园. 快速低秩矩阵与张量恢复的算法研究[D]. 西安: 西安电子科技大学, 2013.

    LIU Yuan-yuan. Algorithm research of fast low-rank matrix and tensor recovery[D]. Xi'an: Xidian University, 2013. (in Chinese).
    [22] CANDÈS E J, RECHT B. Exact matrix completion via convex optimization[J]. Foundations of Computational Mathematics, 2009, 9(6): 717-772. doi: 10.1007/s10208-009-9045-5
    [23] CAI Jian-feng, CANDÈS E J, SHEN Zuo-wei. A singular value thresholding algorithm for matrix completion[J]. SIAM Journal on Optimization, 2010, 20(4): 1956-1982. doi: 10.1137/080738970
    [24] HASTIE T, MAZUMDER R, LEE J D, et al. Matrix completion and low-rank SVD via fast alternating least squares[J]. Journal of Machine Learning Research, 2015, 16: 3367-3402.
    [25] MA Shi-qian, GOLDFARB D, CHEN Li-feng. Fixed point and Bregman iterative methods for matrix rank minimization[J]. Mathematical Programming, 2009, 128(1/2): 321-353.
    [26] CHEN Xiao-bo, XIAO Yan, CAI Yin-feng, et al. Structural max-margin discriminant analysis for feature extraction[J]. Knowledge-Based Systems, 2014, 70: 154-166. doi: 10.1016/j.knosys.2014.06.020
    [27] MURTAGH F, LEGENDRE P. Ward's hierarchical agglomerative clustering method: which algorithms implement ward's criterion?[J]. Journal of Classification, 2014, 31: 274-295.
    [28] FARHANGFAR A, KURGAN L A, PEDRYCZ W. A novel framework for imputation of missing values in databases[J]. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 2007, 37(5): 692-709.
    [29] CHAI T, DRAXLER R R. Root mean square error (RMSE) or mean absolute error (MAE) —Arguments against avoiding RMSE in the literature[J]. Geoscientific Model Development, 2014, 7: 1247-1250.
    [30] ZHENG Zu-duo, SU Dong-cai, et al. Short-term traffic volume forecasting: a k-nearest neighbor approach enhanced by constrained linearly sewing principle component algorithm[J]. Transportation Research Part C: Emerging Technologies, 2014, 43: 143-157.
  • 加载中
图(11) / 表(3)
计量
  • 文章访问数:  1547
  • HTML全文浏览量:  414
  • PDF下载量:  551
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-04-20
  • 刊出日期:  2019-10-25

目录

    /

    返回文章
    返回