留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

元胞推演多步决策的交叉口深度强化学习信号连续控制

赵红星 王钰捷 聂江龙 梁瑞艳 何瑞春

赵红星, 王钰捷, 聂江龙, 梁瑞艳, 何瑞春. 元胞推演多步决策的交叉口深度强化学习信号连续控制[J]. 交通运输工程学报, 2025, 25(4): 296-310. doi: 10.19818/j.cnki.1671-1637.2025.04.021
引用本文: 赵红星, 王钰捷, 聂江龙, 梁瑞艳, 何瑞春. 元胞推演多步决策的交叉口深度强化学习信号连续控制[J]. 交通运输工程学报, 2025, 25(4): 296-310. doi: 10.19818/j.cnki.1671-1637.2025.04.021
ZHAO Hong-xing, WANG Yu-jie, NIE Jiang-long, LIANG Rui-yan, HE Rui-chun. Deep reinforcement learning signal continuous control of intersection based on cellular deduction multi-step decision mechanism[J]. Journal of Traffic and Transportation Engineering, 2025, 25(4): 296-310. doi: 10.19818/j.cnki.1671-1637.2025.04.021
Citation: ZHAO Hong-xing, WANG Yu-jie, NIE Jiang-long, LIANG Rui-yan, HE Rui-chun. Deep reinforcement learning signal continuous control of intersection based on cellular deduction multi-step decision mechanism[J]. Journal of Traffic and Transportation Engineering, 2025, 25(4): 296-310. doi: 10.19818/j.cnki.1671-1637.2025.04.021

元胞推演多步决策的交叉口深度强化学习信号连续控制

doi: 10.19818/j.cnki.1671-1637.2025.04.021
基金项目: 

国家自然科学基金 52162041

甘肃省自然科学基金 24JRRA255

兰州市青年科技人才创新计划 2023-QN-125

详细信息
    作者简介:

    赵红星(1989-),男,甘肃兰州人,兰州交通大学副教授,管理学博士,从事智能交通系统、交通能源消耗与排放研究

    通讯作者:

    何瑞春(1969-),女,甘肃临洮人,兰州交通大学教授,博士

  • 中图分类号: U495

Deep reinforcement learning signal continuous control of intersection based on cellular deduction multi-step decision mechanism

Funds: 

National Natural Science Foundation of China 52162041

Natural Science Foundation of Gansu Province 24JRRA255

Lanzhou Youth Science and Technology Talent Innovation Program 2023-QN-125

More Information
    Corresponding author: HE Rui-chun(1969-), female, professor, PhD, tranman@163.com
Article Text (Baidu Translation)
  • 摘要: 为解决目前大部分基于深度强化学习的自适应信号控制模型中智能体仅能依赖当下状态对信号灯做离散控制的问题,引入多步决策机制建立了交叉口深度强化学习信号连续控制模型;利用元胞推演模拟交叉口交通流的运行和转化,实现交叉口的状态转移;将元胞推演获得的状态经特征提取后与当前放行相位、上一决策周期车辆到达率、驶离率进行拼接作为模型的状态输入,提高智能体决策的准确性;采用多步决策机制完成4个相位的预决策后进行整合并传递至信号灯,实现自适应信号连续控制;为验证所建模型的适用性,基于SUMO平台进行仿真分析,采用交叉口实测流量数据在不同试验场景下与其他5种模型进行比较。分析结果表明:在4种不同的交通场景下,提出模型与基于离散网格状态的深度强化学习信号控制模型优化效果相当,相对基于特征值向量状态的深度强化学习信号控制模型在平均等待时间和油耗指标上最少降低了约9.80%和4.56%,相对传统Webster配时模型在平均等待时间和油耗指标上最少降低了约9.30%和4.67%。可见,提出的模型在实现交通信号连续控制的同时具有良好的稳定性与适应性,对深化DRL信号控制的实际应用具有积极意义。

     

  • 图  1  交叉口及相位方案设置

    Figure  1.  Intersection and phase scheme settings

    图  2  基于DRL的自适应信号控制模型

    Figure  2.  Adaptive signal control model based on DRL

    图  3  道路离散化

    Figure  3.  Road discretization

    图  4  状态处理过程

    Figure  4.  State processing procedure

    图  5  相位保持元胞推演过程

    Figure  5.  Phase-preserving cellular deduction process

    图  6  CTMDRL模型输出

    Figure  6.  Output of CTM DRL model

    图  7  基于2种不同算法的DRL信号控制神经网络结构

    Figure  7.  Structure of DRL signal control neural network based on two different algorithms

    图  8  4种训练场景下累计奖励

    Figure  8.  Cumulative rewards in four training scenarios

    图  9  4种训练场景下平均等待时间

    Figure  9.  Average waiting time in four training scenarios

    图  10  4种交通场景下车辆平均等待时间

    Figure  10.  Average waiting time of vehicles in four traffic scenarios

    图  11  4种交通场景下车辆平均油耗

    Figure  11.  Average fuel consumption of vehicles in four traffic scenarios

    表  1  中华南大街与成峰路交叉口8:00~12:00交通流数据

    Table  1.   Traffic flow data at the intersection of Zhonghua South Street and Chengfeng Road from 8:00 ~12:00 AM  veh

    时间 8:00~9:00 9:00~10:00 10:00~11:00 11:00~12:00
    东西直行 497 422 403 446
    东南左转 231 246 212 227
    东北右转 372 332 322 300
    西东直行 444 403 408 418
    西北左转 155 97 124 154
    西南右转 317 284 261 301
    南北直行 368 302 278 319
    南西左转 198 176 172 203
    南东右转 169 164 162 179
    北南直行 462 405 379 394
    北东左转 182 161 185 203
    北西右转 189 182 190 206
    合计 3 584 3 174 3 096 3 350
    下载: 导出CSV

    表  2  模型超参数设置

    Table  2.   Hyper-parameter settings of model

    参数 取值
    经验池容量 120 000
    采样批量 128
    折扣因子 0.9
    Q网络学习率 3×10-4
    目标网络更新率 0.005
    探索概率初始值 1.0
    探索概率最终值 0.05
    探索概率衰减率 5×10-4
    下载: 导出CSV

    表  3  4种训练场景下各模型训练效果对比

    Table  3.   Comparison of training effects of various models in four training scenarios

    训练场景 指标 CTMD3QN CTMDDQN DGD3QN DGDDQN FVD3QN FVDDQN
    场景1 收敛回合 30 30 30 30 160 170
    收敛后平均累计奖励/105 s -1.58 -1.50 -1.45 -1.67 -1.75 -1.73
    收敛后平均等待时间/s 46.24 43.96 42.24 48.56 51.17 50.59
    场景2 收敛回合 20 20 20 60 100 100
    收敛后平均累计奖励/105 s -0.93 -0.93 -0.93 -0.93 -1.13 -1.30
    收敛后平均等待时间/s 30.69 30.65 30.52 30.47 37.36 42.75
    场景3 收敛回合 20 20 20 20 120 120
    收敛后平均累计奖励/105 s -0.84 -0.86 -0.76 -0.81 -0.95 -0.87
    收敛后平均等待时间/s 28.47 29.10 25.59 27.29 32.02 29.58
    场景4 收敛回合 30 30 30 30 80 80
    收敛后平均累计奖励/105 s -1.22 -1.25 -1.07 -1.20 -1.37 -1.79
    收敛后平均等待时间/s 38.08 39.26 33.52 37.38 42.77 55.83
    下载: 导出CSV

    表  4  模型在4种试验场景下的表现

    Table  4.   Performance of the model in four experimental scenarios

    模型 试验场景1 试验场景2 试验场景3 试验场景4
    等待时间/s 油耗/g 等待时间/s 油耗/g 等待时间/s 油耗/g 等待时间/s 油耗/g
    CTMD3QN 42.04 67.56 30.46 56.01 26.25 51.54 36.34 61.68
    CTMDDQN 41.19 66.62 30.63 56.07 26.50 51.73 34.97 60.30
    DGD3QN 39.76 64.25 32.34 57.16 23.65 48.37 30.88 55.54
    DGDDQN 46.21 70.26 31.05 55.83 26.51 51.31 35.01 59.84
    FVD3QN 47.60 71.58 36.75 61.45 32.41 56.62 40.71 65.03
    FVDDQN 44.73 69.04 36.58 61.31 28.08 52.94 45.39 69.61
    Webster 46.85 71.12 39.33 63.76 32.57 57.20 39.31 63.98
    下载: 导出CSV
  • [1] 张立立, 王力, 张玲玉. 城市道路交通控制概述与展望[J]. 科学技术与工程, 2020, 20(16): 6322-6329.

    ZHANG Li-li, WANG Li, ZHANG Ling-yu. Urban road traffic control overview and prospect[J]. Science Technology and Engineering, 2020, 20(16): 6322-6329.
    [2] MIKAMI S, KAKAZU Y. Genetic reinforcement learning for cooperative traffic signal control[C]// IEEE. Proceedings of the First IEEE Conference on Evolutionary Computation. New York: IEEE, 2002: 223-228.
    [3] 刘全, 翟建伟, 章宗长, 等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1): 1-27.

    LIU Quan, ZHAI Jian-wei, ZHANG Zong-chang, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1): 1-27.
    [4] LI L, LV Y S, WANG F Y. Traffic signal timing via deep reinforcement learning[J]. IEEE/CAA Journal of Automati-ca Sinica, 2016, 3(3): 247-254. doi: 10.1109/JAS.2016.7508798
    [5] DUCROCQ R, FARHI N. Deep reinforcement Q-learning for intelligent traffic signal control with partial detection[J]. International Journal of Intelligent Transportation Systems Research, 2023, 21(1): 192-206. doi: 10.1007/s13177-023-00346-4
    [6] YU J J, LAHAROTTE P A, HAN Y, et al. Decentralized signal control for multi-modal traffic network: A deep rein-forcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2023, 154: 104281. doi: 10.1016/j.trc.2023.104281
    [7] YE B L, CHEN D, WU P, et al. A traffic signal control method based on improved deep reinforcement learning[C]// IEEE. 2024 China Automation Congress (CAC). New York: IEEE, 2024: 5959-5964.
    [8] MA C L, WANG B, LI Z H, et al. Lyapunov function consistent adaptive network signal control with back pressure and reinforcement learning[J/OL]. arXiv, 2022, http://doi.org/10.48550/arXiv.2210.02612.
    [9] KANG L L, HUANG H, LU W K, et al. A dueling deep Q-Network method for low-carbon traffic signal control[J]. Applied Soft Computing, 2023, 141: 110304. doi: 10.1016/j.asoc.2023.110304
    [10] LIANG X Y, DU X S, WANG G L, et al. A deep reinfor-cement learning network for traffic light cycle control[J]. IEEE Transactions on Vehicular Technology, 2019, 68(2): 1243-1253. doi: 10.1109/TVT.2018.2890726
    [11] CAO K R, WANG L W, ZHANG S, et al. Optimization control of adaptive traffic signal with deep reinforcement learning[J]. Electronics, 2024, 13(1): 198. doi: 10.3390/electronics13010198
    [12] RIZZO S G, VANTINI G, CHAWLA S. Time critic policy gradient methods for traffic signal control in complex and congested scenarios[C]//TEREDESAI A, KUMAR V. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2019: 1654-1664.
    [13] MOUSAVI S S, SCHUKAT M, HOWLEY E. Traffic light control using deep policy-gradient and value-function-based reinforcement learning[J]. IET Intelligent Transport Sys-tems, 2017, 11(7): 417-423. doi: 10.1049/iet-its.2017.0153
    [14] YANG S T, YANG B, WONG H S, et al. Cooperative tra-ffic signal control using multi-step return and off-policy asynchronous advantage actor-critic graph algorithm[J]. Knowledge-based Systems, 2019, 183: 104855. doi: 10.1016/j.knosys.2019.07.026
    [15] ASLANI M, MESGARI M S, SEIPEL S, et al. Developing adaptive traffic signal control by actor-critic and direct ex-ploration methods[J]. Proceedings of the Institution of Civil Engineers—Transport, 2019, 172(5): 289-298. doi: 10.1680/jtran.17.00085
    [16] PANG H L, GAO W L. Deep deterministic policy gradient for traffic signal control of single intersection[C]//IEEE. 2019 Chinese Control and Decision Conference (CCDC). New York: IEEE, 2019: 5861-5866.
    [17] ADJIE A P, IDHAM ANANTA TIMUR M. Comparison of DQN and DDPG learning algorithm for intelligent traffic signal controller in semarang road network simulation[C]//IEEE. 2023 11th International Conference on Information and Communication Technology (ICoICT). New York: IEEE, 2023: 1-5.
    [18] HAN G Y, LIU X H, WANG H, et al. An attention rein-forcement learning-based strategy for large-scale adaptive traffic signal control system[J]. Journal of Transportation Engineering, Part A: Systems, 2024, 150(3): 04024001. doi: 10.1061/JTEPBS.TEENG-8261
    [19] LI C H, MA X T, XIA L, et al. Fairness control of traffic light via deep reinforcement learning[C]//IEEE. 2020 IEEE 16th International Conference on Automation Science and En-gineering (CASE). New York: IEEE, 2020: 652-658.
    [20] KOCH L, BRINKMANN T, WEGENER M, et al. Adap-tive traffic light control with deep reinforcement learning: An evaluation of traffic flow and energy consumption[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 15066-15076. doi: 10.1109/TITS.2023.3305548
    [21] HUANG L B, QU X H. Improving traffic signal control operations using proximal policy optimization[J]. IET Intelli-gent Transport Systems, 2023, 17(3): 592-605. doi: 10.1049/itr2.12286
    [22] MAO F, LI Z H, LIN Y L, et al. Mastering arterial traffic signal control with multi-agent attention-based soft actor-cri-tic model[J]. IEEE Transactions on Intelligent Transporta-tion Systems, 2023, 24(3): 3129-3144. doi: 10.1109/TITS.2022.3229477
    [23] 乔志敏, 柯良军. 基于深度强化学习的交通信号控制[J]. 控制理论与应用, 2025, 42(1): 76-86.

    QIAO Zhi-min, KE Liang-jun. Traffic signal control based on deep reinforcement learning[J]. Control Theory & Applica-tions, 2025, 42(1): 76-86.
    [24] FANG S, CHEN F, LIU H C. Dueling double deep Q-net-work for adaptive traffic signal control with low exhaust emissions in a single intersection[J]. IOP Conference Series: Materials Science and Engineering, 2019, 612(5): 052039. doi: 10.1088/1757-899X/612/5/052039
    [25] COGGIN J J. Attention mechanism based deep reinforcement learning for traffic signal control[J]. Application Research of Computers, 2023, 40(2): 430-434.
    [26] 陆丽萍, 程垦, 褚端峰, 等. 基于竞争循环双Q网络的自适应交通信号控制[J]. 中国公路学报, 2022, 35(8): 267-277.

    LU Li-ping, CHENG Ken, CHU Duan-feng, et al. Adaptive traffic signal control based on dueling recurrent double Q network[J]. China Journal of Highway and Transport, 2022, 35(8): 267-277.
    [27] ZHAO Z N, WANG K, WANG Y, et al. Enhancing traffic signal control with composite deep intelligence[J]. Expert Systems with Applications, 2024, 244: 123020. doi: 10.1016/j.eswa.2023.123020
    [28] KUMAR N, RAHMAN S S, DHAKAD N. Fuzzy inference enabled deep reinforcement learning-based traffic light control for intelligent transportation system[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(8): 4919-4928. doi: 10.1109/TITS.2020.2984033
    [29] XU M, WU J P, HUANG L, et al. Network-wide traffic signal control based on the discovery of critical nodes and deep reinforcement learning[J]. Journal of Intelligent Transporta-tion Systems, 2020, 24(1): 1-10. doi: 10.1080/15472450.2018.1527694
    [30] WAN C H, HWANG M C. Value-based deep reinforcement learning for adaptive isolated intersection signal control[J]. IET Intelligent Transport Systems, 2018, 12(9): 1005-1010. doi: 10.1049/iet-its.2018.5170
    [31] TAN K L, SHARMA A, SARKAR S. Robust deep reinfor-cement learning for traffic signal control[J]. Journal of Big Data Analytics in Transportation, 2020, 2(3): 263-274. doi: 10.1007/s42421-020-00029-6
    [32] BOUKTIF S, CHENIKI A, OUNI A, et al. Deep reinforce-ment learning for traffic signal control with consistent state and reward design approach[J]. Knowledge-based Systems, 2023, 267: 110440. doi: 10.1016/j.knosys.2023.110440
    [33] TOUHBI S, BABRAM M A, NGUYEN-HUU T, et al. Adaptive traffic signal control: Exploring reward definition for reinfor-cement learning[J]. Procedia Computer Science, 2017, 109: 513-520. doi: 10.1016/j.procs.2017.05.327
    [34] LI Z N, YU H, ZHANG G H, et al. Network-wide traffic signal control optimization using a multi-agent deep reinfor-cement learning[J]. Transportation Research Part C: Emerg-ing Technologies, 2021, 125: 103059. doi: 10.1016/j.trc.2021.103059
    [35] 刘智敏, 叶宝林, 朱耀东, 等. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.

    LIU Zhi-min, YE Bao-lin, ZHU Yao-dong, et al. Traffic sig-nal control method based on deep reinforcement learning[J]. Journal of Zhejiang University (Engineering Science), 2022, 56(6): 1249-1256.
    [36] 李珊, 任安虎, 白静静. 基于DQN算法的倒计时交叉口信号灯配时研究[J]. 国外电子测量技术, 2021, 40(10): 91-97.

    LI Shan, REN An-hu, BAI Jing-jing. Research on timing of signal light at countdown intersection based on DQN algori-thm[J]. Foreign Electronic Measurement Technology, 2021, 40(10): 91-97.
    [37] DAGANZO C F. The cell transmission model: A dynamic re-presentation of highway traffic consistent with the hydro-dynamic theory[J]. Transportation Research Part B: Metho-dological, 1994, 28(4): 269-287. doi: 10.1016/0191-2615(94)90002-7
    [38] DAGANZO C F. The cell transmission model, Part Ⅱ: Net-work traffic[J]. Transportation Research Part B: Metho-dological, 1995, 29(2): 79-93. doi: 10.1016/0191-2615(94)00022-R
    [39] YE L H, YAMAMOTO T. Modeling connected and auto-nomous vehicles in heterogeneous traffic flow[J]. Physica A: Statistical Mechanics and Its Applications, 2018, 490: 269-277. doi: 10.1016/j.physa.2017.08.015
    [40] MOHEBIFARD R, BIN AL ISLAM S M A, HAJBABAIE A. Cooperative traffic signal and perimeter control in semi-connected urban-street networks[J]. Transportation Resear-ch Part C: Emerging Technologies, 2019, 104: 408-427. doi: 10.1016/j.trc.2019.05.023
    [41] BIN AL ISLAM S M A, HAJBABAIE A, ABDUL AZIZ H M. A real-time network-level traffic signal control metho-dology with partial connected vehicle information[J]. Tran-sportation Research Part C: Emerging Technologies, 2020, 121: 102830. doi: 10.1016/j.trc.2020.102830
    [42] 于少伟, 史忠科. 信号交叉口集聚车辆跟驰模型[J]. 中国公路学报, 2014, 27(11): 93-100.

    YU Shao-wei, SHI Zhong-ke. Car-following model on vehi-cles arrival during the red phase[J]. China Journal of High-way and Transport, 2014, 27(11): 93-100.
    [43] 王福建, 范诚睿, 周斌, 等. 基于多维时空层递的交通信号分布式强化学习方法[J]. 中国公路学报, 2024, 37(7): 250-263.

    WANG Fu-jian, FAN Cheng-rui, ZHOU Bin, et al. Traffic signal decentralized reinforcement learning method based on a multi-perspective spatio-temporal hierarchical structure[J]. China Journal of Highway and Transport, 2024, 37(7): 250-263.
  • 加载中
图(11) / 表(4)
计量
  • 文章访问数:  130
  • HTML全文浏览量:  47
  • PDF下载量:  11
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-10-17
  • 录用日期:  2025-06-06
  • 修回日期:  2025-05-21
  • 刊出日期:  2025-08-28

目录

    /

    返回文章
    返回