留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多智能体强化学习的多无人机协同交通监控路径优化

刘晓波 轩辕婧祎 谢沅志 郑芳芳

刘晓波, 轩辕婧祎, 谢沅志, 郑芳芳. 基于多智能体强化学习的多无人机协同交通监控路径优化[J]. 交通运输工程学报, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161
引用本文: 刘晓波, 轩辕婧祎, 谢沅志, 郑芳芳. 基于多智能体强化学习的多无人机协同交通监控路径优化[J]. 交通运输工程学报, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161
LIU Xiao-bo, XUANYUAN Jing-yi, XIE Yuan-zhi, ZHENG Fang-fang. Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning[J]. Journal of Traffic and Transportation Engineering, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161
Citation: LIU Xiao-bo, XUANYUAN Jing-yi, XIE Yuan-zhi, ZHENG Fang-fang. Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning[J]. Journal of Traffic and Transportation Engineering, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161

基于多智能体强化学习的多无人机协同交通监控路径优化

doi: 10.19818/j.cnki.1671-1637.2026.161
基金项目: 

国家自然科学基金重点项目 52232011

四川省科技计划项目 2025YFHZ0193

四川省科技计划项目 2025HJPJ0011

详细信息
    作者简介:

    刘晓波(1974-),男,四川成都人,教授,工学博士,E-mail:xiaobo.liu@swjtu.cn

    通讯作者:

    郑芳芳(1981-),女,浙江浦江人,教授,工学博士,E-mail:fzheng@swjtu.cn

  • 中图分类号: U8

Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning

Funds: 

Key Program of National Natural Science Foundation of China 52232011

Science and Technology Planning Project of Sichuan Province 2025YFHZ0193

Science and Technology Planning Project of Sichuan Province 2025HJPJ0011

More Information
    Corresponding author: ZHENG Fang-fang, professor, PhD, E-mail: fzheng@swjtu.cn
Article Text (Baidu Translation)
  • 摘要: 为优化含电池更换站约束的多无人机协同交通监控路径规划问题,构建了基于无人机团队定向问题的混合整数线性规划模型,采用聚类方法确定电池更换站选址,使其分布均匀;提出了基于多智能体Transformer的强化学习(MTRL)算法框架,采用集中式Transformer架构,编码器利用多头注意力机制学习场景图结构全局表示,解码器生成协作式路径决策;设计了基于目标点访问数量的奖励函数,优化无人机访问序列与电池更换策略;引入结构化掩码机制,消除子回路、重复访问及路径冲突,保证解的可行性;在不同目标点数量、电池更换站数量和无人机数量的9类规模场景下开展数值试验。试验结果表明:MTRL在全部9类场景中均能获得高质量可行解,训练收敛稳定;与商业求解器相比,小、中规模场景下平均累计奖励提升9.77%~28.77%,大规模场景下提升9.34%~14.84%,而遗传算法、禁忌搜索在大规模场景下较商业求解器下降28%~41%;推理时间保持毫秒级;在18组跨分布泛化试验中,相对误差均控制在1%以内。所提框架可为无人机集群任务规划、智能交通路径优化及物流配送调度提供高效求解方案,并为多智能体强化学习在复杂约束优化问题中的应用提供方法参考。

     

  • 图  1  含电池更换站的无人机协同交通监控路径示例

    Figure  1.  Example paths for cooperative UAV traffic monitoring with battery replacement stations

    图  2  MTRL模型架构

    Figure  2.  Architecture of MTRL model

    图  3  多头注意力机制架构

    Figure  3.  Architecture of multi-head attention mechanism

    图  4  五种聚类算法性能对比

    Figure  4.  Comparison of performances among five clustering algorithms

    图  5  电池更换站选址结果

    Figure  5.  Location results of battery replacement stations

    图  6  MTRL模型训练奖励曲线

    Figure  6.  Training reward curves of MTRL model

    图  7  多无人机路径规划结果可视化示例

    Figure  7.  Visualization example of multi-UAV path planning results

    图  8  城市道路交通监控结果

    Figure  8.  Results of urban road traffic control

    表  1  掩码含义说明

    Table  1.   Explanation of masks

    掩码符号 掩码名称 掩码含义
    $ \boldsymbol{\mathcal{M}}_{1} $ 重复性约束 已被任意无人机访问过的目标点被掩码
    $ \boldsymbol{\mathcal{M}}_{2} $ 能量约束 到达下一个目标点后剩余电量不够返回基地的目标点被掩码
    $ \boldsymbol{\mathcal{M}}_{3} $ 累计时间 到达下一个目标点并返回基地的累计时间超过最大任务时限的目标点被掩码
    $ \boldsymbol{\mathcal{M}}_{4} $ 电池更换站连续访问约束 从电池更换站出发时所有电池更换站被掩码
    $ \boldsymbol{\mathcal{M}}_{5} $ 电量安全约束 到达下一个目标点后剩余电量不够到达至少一个电池更换站的目标点被掩码
    $ \boldsymbol{\mathcal{M}}_{6} $ 基地访问规则 起始基地始终被掩码
    下载: 导出CSV

    表  2  不同问题规模计算结果比较

    Table  2.   Comparison of calculation results for problems of different sizes

    算法 T20 T50 T100
    目标值(累计奖励) 提升比例/% 计算时间/s 目标值(累计奖励) 提升比例/% 计算时间/s 目标值(累计奖励) 提升比例/% 计算时间/s
    A2 CPLEX 18.215 6 0.00 60.372 4 27.260 0 0.00 107.120 0 41.684 3 0.00 118.763 5
    TS 18.011 7 -1.12 20.271 1 23.179 7 -14.97 27.949 3 24.750 0 -40.63 39.259 9
    GA 18.050 8 -0.90 2.768 0 23.773 4 -12.80 4.030 0 26.238 2 -37.05 5.773 3
    MTRL 19.996 1 9.77 0.003 4 35.101 6 28.77 0.006 7 47.871 8 14.84 0.008 2
    PNRL 19.972 7 9.65 0.006 8 33.738 3 23.76 0.012 3 46.872 3 12.45 0.009 6
    A3 CPLEX 16.639 2 0.00 112.956 1 37.726 3 0.00 106.723 9 52.763 2 0.00 117.763 8
    TS 19.996 1 20.17 19.625 0 32.140 6 -14.81 32.057 9 35.543 0 -32.64 44.795 7
    GA 20.000 0 20.20 3.080 1 32.058 6 -15.02 5.224 3 36.878 9 -30.10 7.763 9
    MTRL 20.000 0 20.20 0.006 1 43.750 0 15.97 0.005 7 60.518 1 14.70 0.009 3
    PNRL 19.991 2 20.15 0.007 3 42.339 8 12.23 0.012 3 58.375 9 10.64 0.011 6
    A4 CPLEX 16.284 7 0.00 117.485 2 43.681 7 0.00 114.608 3 64.394 6 0.00 118.265 4
    TS 20.000 0 22.81 18.468 8 39.484 4 -9.61 35.205 9 45.058 6 -30.03 51.798 3
    GA 20.000 0 22.81 3.269 0 39.269 5 -10.10 6.441 4 46.183 5 -28.28 9.655 3
    MTRL 20.000 0 22.81 0.004 0 49.539 1 13.41 0.006 1 71.026 7 9.34 0.009 8
    PNRL 19.982 6 22.71 0.008 7 48.668 0 11.42 0.009 5 70.635 7 9.70 0.012 4
    下载: 导出CSV

    表  3  不同节点分布模式的性能对比

    Table  3.   Performance comparison of different node distribution modes

    问题规模 $ {M}_{\mathrm{u}}\to {T}_{\mathrm{u}} $ $ {M}_{\mathrm{u}}\to {T}_{\mathrm{n}} $ $ {\eta }_{\mathrm{u}} $/% $ {M}_{\mathrm{n}}\to {T}_{\mathrm{u}} $ $ {M}_{\mathrm{n}}\to {T}_{\mathrm{n}} $ $ {\eta }_{\mathrm{n}} $/%
    T20B2A2 19.996 1 19.980 5 0.078 0 19.925 8 19.945 3 0.097 8
    T20B2A3 20.000 0 20.000 0 0.000 0 20.000 0 20.000 0 0.000 0
    T20B2A4 20.000 0 20.000 0 0.000 0 20.000 0 20.000 0 0.000 0
    T50B5A2 35.101 6 34.906 2 0.556 7 34.207 0 34.105 5 0.297 6
    T50B5A3 43.750 0 43.664 1 0.196 3 43.363 3 43.183 5 0.416 3
    T50B5A4 49.539 1 49.441 4 0.197 2 49.164 1 49.087 4 0.156 3
    T100B10A2 47.871 8 47.593 8 0.580 7 47.085 7 46.872 3 0.455 3
    T100B10A3 60.518 1 60.257 5 0.430 6 58.473 4 58.375 9 0.167 0
    T100B10A4 71.026 7 70.753 5 0.384 6 71.009 8 70.635 7 0.529 6
    下载: 导出CSV
  • [1] LIU S, BAI Y B. Multiple UAVs collaborative traffic monitoring with intention-based communication[J]. Computer Communications, 2023, 210: 116-129. doi: 10.1016/j.comcom.2023.08.005
    [2] WANG K, WU Q Q, HE X T, et al. Optimizing UAV traffic monitoring routes during rush hours considering spatiotemporal variation of monitoring demand[J]. International Journal of Geographical Information Science, 2022, 36(10): 2086-2111. doi: 10.1080/13658816.2022.2045605
    [3] COIFMAN B. Improved velocity estimation using single loop detectors[J]. Transportation Research Part A: Policy and Practice, 2001, 35(10): 863-880. doi: 10.1016/S0965-8564(00)00028-8
    [4] KOUTSIA A, SEMERTZIDIS T, DIMITROPOULOS K, et al. Intelligent traffic monitoring and surveillance with multiple cameras[C]//IEEE. 2008 International Workshop on Content-Based Multimedia Indexing. New York: IEEE, 2008: 125-132.
    [5] CAO P, XIONG Z Q, LIU X B. An analytical model for quantifying the efficiency of traffic-data collection using instrumented vehicles[J]. Transportation Research Part C: Emerging Technologies, 2022, 136: 103558. doi: 10.1016/j.trc.2022.103558
    [6] VANDENBERGHE W, VANHAUWAERT E, VERBRUGGE S, et al. Feasibility of expanding traffic monitoring systems with floating car data technology[J]. IET Intelligent Transport Systems, 2012, 6(4): 347-354. doi: 10.1049/iet-its.2011.0221
    [7] SEO T, KUSAKABE T, ASAKURA Y. Estimation of flow and density using probe vehicles with spacing measurement equipment[J]. Transportation Research Part C: Emerging Technologies, 2015, 53: 134-150. doi: 10.1016/j.trc.2015.01.033
    [8] LI X, SHU W, LI M L, et al. Performance evaluation of vehicle-based mobile sensor networks for traffic monitoring[J]. IEEE Transactions on Vehicular Technology, 2009, 58(4): 1647-1653. doi: 10.1109/TVT.2008.2005775
    [9] HUANG P D, CHENG M, CHEN Y P, et al. Traffic sign occlusion detection using mobile laser scanning point clouds[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2364-2376. doi: 10.1109/TITS.2016.2639582
    [10] JIAO J F, WANG H H. Traffic behavior recognition from traffic videos under occlusion condition: A Kalman filter approach[J]. Transportation Research Record : Journal of the Transportation Research Board, 2022, 2676(7): 55-65. doi: 10.1177/03611981221076426
    [11] SONG X G, PI R D, LV C, et al. Augmented multiple vehicles' trajectories extraction under occlusions with roadside LiDAR data[J]. IEEE Sensors Journal, 2021, 21(19): 21921-21930. doi: 10.1109/JSEN.2021.3079257
    [12] ZHAO J X, XU H, ZHANG Y B, et al. Automatic identification of vehicle partial occlusion in data collected by roadside LiDAR sensors[J]. Transportation Research Record: Journal of the Transportation Research Board, 2022, 2676(5): 708-718. doi: 10.1177/03611981211069347
    [13] LI M, ZHEN L, WANG S A, et al. Unmanned aerial vehicle scheduling problem for traffic monitoring[J]. Computers & Industrial Engineering, 2018, 122: 15-23.
    [14] LI S G, YU H K, ZHANG J R, et al. Video-based traffic data collection system for multiple vehicle types[J]. IET Intelligent Transport Systems, 2014, 8(2): 164-174. doi: 10.1049/iet-its.2012.0099
    [15] HUANG H L, SAVKIN A V, HUANG C. Decentralized autonomous navigation of a UAV network for road traffic monitoring[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(4): 2558-2564. doi: 10.1109/TAES.2021.3053115
    [16] 马庆禄, 王欣宇, 张书, 等. 智能网联环境下近邻匝道交通耦合自组织方法[J]. 交通运输工程学报, 2024, 24(2): 207-220.

    MA Qing-lu, WANG Xin-yu, ZHANG Shu, et al. Self-organizing method for traffic coupling between adjacent ramps in intelligent and connected environments[J]. Journal of Traffic and Transportation Engineering, 2024, 24(2): 207-220.
    [17] 谢济铭, 夏玉兰, 钱正富, 等. 考虑智能网联近邻车辆信息的交织区换道风险预警[J]. 交通运输工程学报, 2023, 23(2): 287-300. doi: 10.19818/j.cnki.1671-1637.2023.02.021

    XIE Ji-ming, XIA Yu-lan, QIAN Zheng-fu, et al. Lane-change risk warning in interweaving area considering information from intelligent connected near-neighboring vehicles[J]. Journal of Traffic and Transportation Engineering, 2023, 23(2): 287-300. doi: 10.19818/j.cnki.1671-1637.2023.02.021
    [18] YAN H, CHEN Y F, YANG S H. UAV-enabled wireless power transfer with base station charging and UAV power consumption[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12883-12896. doi: 10.1109/TVT.2020.3015246
    [19] COELHO B N, COELHO V N, COELHO I M, et al. A multi-objective green UAV routing problem[J]. Computers & Operations Research, 2017, 88: 306-315.
    [20] XU W Z, XU Z C, PENG J, et al. Approximation algorithms for the team orienteering problem[C]//IEEE. INFOCOM 2020 -IEEE Conference on Computer Communications. New York: IEEE, 2020: 1389-1398.
    [21] JUAN A A, MARUGAN C A, AHSINI Y, et al. Using reinforcement learning to solve a dynamic orienteering problem with random rewards affected by the battery status[J]. Batteries, 2023, 9(8): 416. doi: 10.3390/batteries9080416
    [22] AMMOURIOVA M, GUERRERO A, TSERTSVADZE V, et al. Using reinforcement learning in a dynamic team orienteering problem with electric batteries[J]. Batteries, 2024, 10(12): 411. doi: 10.3390/batteries10120411
    [23] LEE J J, RATHINAM S. Team orienteering and scheduling algorithms for collaborative UAV-UGV area coverage with battery constraints[C]//IEEE. 2025 International Conference on Unmanned Aircraft Systems (ICUAS). New York: IEEE, 2025: 625-632.
    [24] 秦文龙, 罗贺, 李晓多, 等. 考虑多换电站的多无人机应急电力巡检路径规划方法[J]. 控制与决策, 2025, 40(8): 2391-2399.

    QIN Wen-long, LUO He, LI Xiao-duo, et al. Multi-UAV emergency power inspection path planning method considering multiple charging stations[J]. Control and Decision, 2025, 40(8): 2391-2399.
    [25] FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. Solving routing problems for multiple cooperative Unmanned Aerial Vehicles using Transformer networks[J]. Engineering Applications of Artificial Intelligence, 2023, 122: 106085. doi: 10.1016/j.engappai.2023.106085
    [26] NOVOA C, STORER R. An approximate dynamic programming approach for the vehicle routing problem with stochastic demands[J]. European Journal of Operational Research, 2009, 196(2): 509-515. doi: 10.1016/j.ejor.2008.03.023
    [27] KIRÁLY A, ABONYI J. Redesign of the supply of mobile mechanics based on a novel genetic optimization algorithm using Google Maps API[J]. Engineering Applications of Artificial Intelligence, 2015, 38: 122-130. doi: 10.1016/j.engappai.2014.10.015
    [28] FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. TOP-former: A multi-agent transformer approach for the team orienteering problem[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(9): 13799-13810. doi: 10.1109/TITS.2025.3566157
    [29] BAI L H, ZHENG F F, HOU K N, et al. Longitudinal control of automated vehicles: A novel approach by integrating deep reinforcement learning with intelligent driver model[J]. IEEE Transactions on Vehicular Technology, 2024, 73(8): 11014-11028. doi: 10.1109/TVT.2024.3376599
    [30] 张洪海, 夷珈, 李姗, 等. 低空空域容量评估研究综述[J]. 交通运输工程学报, 2023, 23(6): 78-93. doi: 10.19818/j.cnki.1671-1637.2023.06.003

    ZHANG Hong-hai, YI Jia, LI Shan, et al. Review on research of low-altitude airspace capacity evaluation[J]. Journal of Traffic and Transportation Engineering, 2023, 23(6): 78-93. doi: 10.19818/j.cnki.1671-1637.2023.06.003
    [31] 李诚龙, 屈文秋, 李彦冬, 等. 面向eVTOL航空器的城市空中运输交通管理综述[J]. 交通运输工程学报, 2020, 20(4): 35-54. doi: 10.19818/j.cnki.1671-1637.2020.04.003

    LI Cheng-long, QU Wen-qiu, LI Yan-dong, et al. Overview of traffic management of urban air mobility (UAM)with eVTOL aircraft[J]. Journal of Traffic and Transportation Engineering, 2020, 20(4): 35-54. doi: 10.19818/j.cnki.1671-1637.2020.04.003
    [32] 刘伟, 钟灿, 曹文明. 基于数据驱动的路网连续交通流短时预测方法综述[J]. 交通运输工程学报, 2026, 26(2): 24-43. doi: 10.19818/j.cnki.1671-1637.2026.141

    LIU Wei, ZHONG Can, CAO Wen-ming. Review of data-driven short-term prediction methods for continuous traffic flow in road networks[J]. Journal of Traffic and Transportation Engineering, 2026, 26(2): 24-43. doi: 10.19818/j.cnki.1671-1637.2026.141
    [33] LIN B, GHADDAR B, NATHWANI J. Deep reinforcement learning for the electric vehicle routing problem with time windows[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11528-11538. doi: 10.1109/TITS.2021.3105232
    [34] KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![C]//ICLR. 7th International Conference on Learning Representations. Washington DC: ICLR, 2019: 39.
    [35] REN L, FAN X Y, CUI J, et al. A multi-agent reinforcement learning method with route recorders for vehicle routing in supply chain management[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16410-16420. doi: 10.1109/TITS.2022.3150151
    [36] FAN M F, WU Y X, LIAO T J, et al. Deep reinforcement learning for UAV routing in the presence of multiple charging stations[J]. IEEE Transactions on Vehicular Technology, 2023, 72(5): 5732-5746. doi: 10.1109/TVT.2022.3232607
    [37] ZHANG K, HE F, ZHANG Z C, et al. Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2020, 121: 102861. doi: 10.1016/j.trc.2020.102861
    [38] VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[J]. Advances in Neural Information Processing Systems, 2015, 28: 2692-2700.
    [39] CALINSKI T, HARABASZ J. A dendrite method for cluster analysis[J]. Communications in Statistics-Simulation and Computation, 1974, 3(1): 791519860.
  • 加载中
图(8) / 表(3)
计量
  • 文章访问数:  184
  • HTML全文浏览量:  72
  • PDF下载量:  50
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-09-04
  • 录用日期:  2026-01-23
  • 修回日期:  2025-12-03
  • 刊出日期:  2026-04-28

目录

    /

    返回文章
    返回