Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning
-
摘要: 为优化含电池更换站约束的多无人机协同交通监控路径规划问题,构建了基于无人机团队定向问题的混合整数线性规划模型,采用聚类方法确定电池更换站选址,使其分布均匀;提出了基于多智能体Transformer的强化学习(MTRL)算法框架,采用集中式Transformer架构,编码器利用多头注意力机制学习场景图结构全局表示,解码器生成协作式路径决策;设计了基于目标点访问数量的奖励函数,优化无人机访问序列与电池更换策略;引入结构化掩码机制,消除子回路、重复访问及路径冲突,保证解的可行性;在不同目标点数量、电池更换站数量和无人机数量的9类规模场景下开展数值试验。试验结果表明:MTRL在全部9类场景中均能获得高质量可行解,训练收敛稳定;与商业求解器相比,小、中规模场景下平均累计奖励提升9.77%~28.77%,大规模场景下提升9.34%~14.84%,而遗传算法、禁忌搜索在大规模场景下较商业求解器下降28%~41%;推理时间保持毫秒级;在18组跨分布泛化试验中,相对误差均控制在1%以内。所提框架可为无人机集群任务规划、智能交通路径优化及物流配送调度提供高效求解方案,并为多智能体强化学习在复杂约束优化问题中的应用提供方法参考。Abstract: To optimize the multiple unmanned aerial vehicles (multi-UAVs) cooperative traffic monitoring path planning with battery replacement station constraints, a mixed-integer linear programming model based on the UAV team orienteering problem was constructed, and a clustering method was adopted to determine the battery replacement stations' locations to achieve uniform distribution. A multi-agent Transformer-based reinforcement learning (MTRL) algorithm framework was proposed, in which a centralized Transformer architecture was adopted. The encoder was used to learn the global graph-structured representation of the scenario via multi-head attention mechanism, and the decoder was used to generate collaborative path planning. A reward function based on the number of visited target nodes was designed to optimize the UAV visiting sequence and battery replacement strategy. A structured masking mechanism was introduced to eliminate subcircuits, repeated visits, and path conflicts, ensuring solution feasibility. Numerical experiments were conducted in scenarios of 9 types of scale with varying numbers of target nodes, battery replacement stations, and UAVs. The results show that MTRL obtains high-quality feasible solutions in all 9 types of scenarios with stable training convergence. Compared with the commercial solver, the average cumulative reward increases by 9.77%-28.77% in small- and medium-scale scenarios and by 9.34%-14.84% in large-scale scenarios, while that of the genetic algorithm and tabu search decreases by 28%-41% in large-scale scenarios. The inference time remains at the millisecond level. In 18 groups of cross-distribution generalization experiments, the relative error is controlled within 1%. The proposed framework provides an efficient solution for UAV swarm mission planning, intelligent transportation path optimization, and logistics distribution scheduling. In addition, it offers a methodological reference for the application of multi-agent reinforcement learning to complex constrained optimization problems.
-
表 1 掩码含义说明
Table 1. Explanation of masks
掩码符号 掩码名称 掩码含义 $ \boldsymbol{\mathcal{M}}_{1} $ 重复性约束 已被任意无人机访问过的目标点被掩码 $ \boldsymbol{\mathcal{M}}_{2} $ 能量约束 到达下一个目标点后剩余电量不够返回基地的目标点被掩码 $ \boldsymbol{\mathcal{M}}_{3} $ 累计时间 到达下一个目标点并返回基地的累计时间超过最大任务时限的目标点被掩码 $ \boldsymbol{\mathcal{M}}_{4} $ 电池更换站连续访问约束 从电池更换站出发时所有电池更换站被掩码 $ \boldsymbol{\mathcal{M}}_{5} $ 电量安全约束 到达下一个目标点后剩余电量不够到达至少一个电池更换站的目标点被掩码 $ \boldsymbol{\mathcal{M}}_{6} $ 基地访问规则 起始基地始终被掩码 表 2 不同问题规模计算结果比较
Table 2. Comparison of calculation results for problems of different sizes
算法 T20 T50 T100 目标值(累计奖励) 提升比例/% 计算时间/s 目标值(累计奖励) 提升比例/% 计算时间/s 目标值(累计奖励) 提升比例/% 计算时间/s A2 CPLEX 18.215 6 0.00 60.372 4 27.260 0 0.00 107.120 0 41.684 3 0.00 118.763 5 TS 18.011 7 -1.12 20.271 1 23.179 7 -14.97 27.949 3 24.750 0 -40.63 39.259 9 GA 18.050 8 -0.90 2.768 0 23.773 4 -12.80 4.030 0 26.238 2 -37.05 5.773 3 MTRL 19.996 1 9.77 0.003 4 35.101 6 28.77 0.006 7 47.871 8 14.84 0.008 2 PNRL 19.972 7 9.65 0.006 8 33.738 3 23.76 0.012 3 46.872 3 12.45 0.009 6 A3 CPLEX 16.639 2 0.00 112.956 1 37.726 3 0.00 106.723 9 52.763 2 0.00 117.763 8 TS 19.996 1 20.17 19.625 0 32.140 6 -14.81 32.057 9 35.543 0 -32.64 44.795 7 GA 20.000 0 20.20 3.080 1 32.058 6 -15.02 5.224 3 36.878 9 -30.10 7.763 9 MTRL 20.000 0 20.20 0.006 1 43.750 0 15.97 0.005 7 60.518 1 14.70 0.009 3 PNRL 19.991 2 20.15 0.007 3 42.339 8 12.23 0.012 3 58.375 9 10.64 0.011 6 A4 CPLEX 16.284 7 0.00 117.485 2 43.681 7 0.00 114.608 3 64.394 6 0.00 118.265 4 TS 20.000 0 22.81 18.468 8 39.484 4 -9.61 35.205 9 45.058 6 -30.03 51.798 3 GA 20.000 0 22.81 3.269 0 39.269 5 -10.10 6.441 4 46.183 5 -28.28 9.655 3 MTRL 20.000 0 22.81 0.004 0 49.539 1 13.41 0.006 1 71.026 7 9.34 0.009 8 PNRL 19.982 6 22.71 0.008 7 48.668 0 11.42 0.009 5 70.635 7 9.70 0.012 4 表 3 不同节点分布模式的性能对比
Table 3. Performance comparison of different node distribution modes
问题规模 $ {M}_{\mathrm{u}}\to {T}_{\mathrm{u}} $ $ {M}_{\mathrm{u}}\to {T}_{\mathrm{n}} $ $ {\eta }_{\mathrm{u}} $/% $ {M}_{\mathrm{n}}\to {T}_{\mathrm{u}} $ $ {M}_{\mathrm{n}}\to {T}_{\mathrm{n}} $ $ {\eta }_{\mathrm{n}} $/% T20B2A2 19.996 1 19.980 5 0.078 0 19.925 8 19.945 3 0.097 8 T20B2A3 20.000 0 20.000 0 0.000 0 20.000 0 20.000 0 0.000 0 T20B2A4 20.000 0 20.000 0 0.000 0 20.000 0 20.000 0 0.000 0 T50B5A2 35.101 6 34.906 2 0.556 7 34.207 0 34.105 5 0.297 6 T50B5A3 43.750 0 43.664 1 0.196 3 43.363 3 43.183 5 0.416 3 T50B5A4 49.539 1 49.441 4 0.197 2 49.164 1 49.087 4 0.156 3 T100B10A2 47.871 8 47.593 8 0.580 7 47.085 7 46.872 3 0.455 3 T100B10A3 60.518 1 60.257 5 0.430 6 58.473 4 58.375 9 0.167 0 T100B10A4 71.026 7 70.753 5 0.384 6 71.009 8 70.635 7 0.529 6 -
[1] LIU S, BAI Y B. Multiple UAVs collaborative traffic monitoring with intention-based communication[J]. Computer Communications, 2023, 210: 116-129. doi: 10.1016/j.comcom.2023.08.005 [2] WANG K, WU Q Q, HE X T, et al. Optimizing UAV traffic monitoring routes during rush hours considering spatiotemporal variation of monitoring demand[J]. International Journal of Geographical Information Science, 2022, 36(10): 2086-2111. doi: 10.1080/13658816.2022.2045605 [3] COIFMAN B. Improved velocity estimation using single loop detectors[J]. Transportation Research Part A: Policy and Practice, 2001, 35(10): 863-880. doi: 10.1016/S0965-8564(00)00028-8 [4] KOUTSIA A, SEMERTZIDIS T, DIMITROPOULOS K, et al. Intelligent traffic monitoring and surveillance with multiple cameras[C]//IEEE. 2008 International Workshop on Content-Based Multimedia Indexing. New York: IEEE, 2008: 125-132. [5] CAO P, XIONG Z Q, LIU X B. An analytical model for quantifying the efficiency of traffic-data collection using instrumented vehicles[J]. Transportation Research Part C: Emerging Technologies, 2022, 136: 103558. doi: 10.1016/j.trc.2022.103558 [6] VANDENBERGHE W, VANHAUWAERT E, VERBRUGGE S, et al. Feasibility of expanding traffic monitoring systems with floating car data technology[J]. IET Intelligent Transport Systems, 2012, 6(4): 347-354. doi: 10.1049/iet-its.2011.0221 [7] SEO T, KUSAKABE T, ASAKURA Y. Estimation of flow and density using probe vehicles with spacing measurement equipment[J]. Transportation Research Part C: Emerging Technologies, 2015, 53: 134-150. doi: 10.1016/j.trc.2015.01.033 [8] LI X, SHU W, LI M L, et al. Performance evaluation of vehicle-based mobile sensor networks for traffic monitoring[J]. IEEE Transactions on Vehicular Technology, 2009, 58(4): 1647-1653. doi: 10.1109/TVT.2008.2005775 [9] HUANG P D, CHENG M, CHEN Y P, et al. Traffic sign occlusion detection using mobile laser scanning point clouds[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2364-2376. doi: 10.1109/TITS.2016.2639582 [10] JIAO J F, WANG H H. Traffic behavior recognition from traffic videos under occlusion condition: A Kalman filter approach[J]. Transportation Research Record : Journal of the Transportation Research Board, 2022, 2676(7): 55-65. doi: 10.1177/03611981221076426 [11] SONG X G, PI R D, LV C, et al. Augmented multiple vehicles' trajectories extraction under occlusions with roadside LiDAR data[J]. IEEE Sensors Journal, 2021, 21(19): 21921-21930. doi: 10.1109/JSEN.2021.3079257 [12] ZHAO J X, XU H, ZHANG Y B, et al. Automatic identification of vehicle partial occlusion in data collected by roadside LiDAR sensors[J]. Transportation Research Record: Journal of the Transportation Research Board, 2022, 2676(5): 708-718. doi: 10.1177/03611981211069347 [13] LI M, ZHEN L, WANG S A, et al. Unmanned aerial vehicle scheduling problem for traffic monitoring[J]. Computers & Industrial Engineering, 2018, 122: 15-23. [14] LI S G, YU H K, ZHANG J R, et al. Video-based traffic data collection system for multiple vehicle types[J]. IET Intelligent Transport Systems, 2014, 8(2): 164-174. doi: 10.1049/iet-its.2012.0099 [15] HUANG H L, SAVKIN A V, HUANG C. Decentralized autonomous navigation of a UAV network for road traffic monitoring[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(4): 2558-2564. doi: 10.1109/TAES.2021.3053115 [16] 马庆禄, 王欣宇, 张书, 等. 智能网联环境下近邻匝道交通耦合自组织方法[J]. 交通运输工程学报, 2024, 24(2): 207-220.MA Qing-lu, WANG Xin-yu, ZHANG Shu, et al. Self-organizing method for traffic coupling between adjacent ramps in intelligent and connected environments[J]. Journal of Traffic and Transportation Engineering, 2024, 24(2): 207-220. [17] 谢济铭, 夏玉兰, 钱正富, 等. 考虑智能网联近邻车辆信息的交织区换道风险预警[J]. 交通运输工程学报, 2023, 23(2): 287-300. doi: 10.19818/j.cnki.1671-1637.2023.02.021 XIE Ji-ming, XIA Yu-lan, QIAN Zheng-fu, et al. Lane-change risk warning in interweaving area considering information from intelligent connected near-neighboring vehicles[J]. Journal of Traffic and Transportation Engineering, 2023, 23(2): 287-300. doi: 10.19818/j.cnki.1671-1637.2023.02.021 [18] YAN H, CHEN Y F, YANG S H. UAV-enabled wireless power transfer with base station charging and UAV power consumption[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12883-12896. doi: 10.1109/TVT.2020.3015246 [19] COELHO B N, COELHO V N, COELHO I M, et al. A multi-objective green UAV routing problem[J]. Computers & Operations Research, 2017, 88: 306-315. [20] XU W Z, XU Z C, PENG J, et al. Approximation algorithms for the team orienteering problem[C]//IEEE. INFOCOM 2020 -IEEE Conference on Computer Communications. New York: IEEE, 2020: 1389-1398. [21] JUAN A A, MARUGAN C A, AHSINI Y, et al. Using reinforcement learning to solve a dynamic orienteering problem with random rewards affected by the battery status[J]. Batteries, 2023, 9(8): 416. doi: 10.3390/batteries9080416 [22] AMMOURIOVA M, GUERRERO A, TSERTSVADZE V, et al. Using reinforcement learning in a dynamic team orienteering problem with electric batteries[J]. Batteries, 2024, 10(12): 411. doi: 10.3390/batteries10120411 [23] LEE J J, RATHINAM S. Team orienteering and scheduling algorithms for collaborative UAV-UGV area coverage with battery constraints[C]//IEEE. 2025 International Conference on Unmanned Aircraft Systems (ICUAS). New York: IEEE, 2025: 625-632. [24] 秦文龙, 罗贺, 李晓多, 等. 考虑多换电站的多无人机应急电力巡检路径规划方法[J]. 控制与决策, 2025, 40(8): 2391-2399.QIN Wen-long, LUO He, LI Xiao-duo, et al. Multi-UAV emergency power inspection path planning method considering multiple charging stations[J]. Control and Decision, 2025, 40(8): 2391-2399. [25] FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. Solving routing problems for multiple cooperative Unmanned Aerial Vehicles using Transformer networks[J]. Engineering Applications of Artificial Intelligence, 2023, 122: 106085. doi: 10.1016/j.engappai.2023.106085 [26] NOVOA C, STORER R. An approximate dynamic programming approach for the vehicle routing problem with stochastic demands[J]. European Journal of Operational Research, 2009, 196(2): 509-515. doi: 10.1016/j.ejor.2008.03.023 [27] KIRÁLY A, ABONYI J. Redesign of the supply of mobile mechanics based on a novel genetic optimization algorithm using Google Maps API[J]. Engineering Applications of Artificial Intelligence, 2015, 38: 122-130. doi: 10.1016/j.engappai.2014.10.015 [28] FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. TOP-former: A multi-agent transformer approach for the team orienteering problem[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(9): 13799-13810. doi: 10.1109/TITS.2025.3566157 [29] BAI L H, ZHENG F F, HOU K N, et al. Longitudinal control of automated vehicles: A novel approach by integrating deep reinforcement learning with intelligent driver model[J]. IEEE Transactions on Vehicular Technology, 2024, 73(8): 11014-11028. doi: 10.1109/TVT.2024.3376599 [30] 张洪海, 夷珈, 李姗, 等. 低空空域容量评估研究综述[J]. 交通运输工程学报, 2023, 23(6): 78-93. doi: 10.19818/j.cnki.1671-1637.2023.06.003 ZHANG Hong-hai, YI Jia, LI Shan, et al. Review on research of low-altitude airspace capacity evaluation[J]. Journal of Traffic and Transportation Engineering, 2023, 23(6): 78-93. doi: 10.19818/j.cnki.1671-1637.2023.06.003 [31] 李诚龙, 屈文秋, 李彦冬, 等. 面向eVTOL航空器的城市空中运输交通管理综述[J]. 交通运输工程学报, 2020, 20(4): 35-54. doi: 10.19818/j.cnki.1671-1637.2020.04.003 LI Cheng-long, QU Wen-qiu, LI Yan-dong, et al. Overview of traffic management of urban air mobility (UAM)with eVTOL aircraft[J]. Journal of Traffic and Transportation Engineering, 2020, 20(4): 35-54. doi: 10.19818/j.cnki.1671-1637.2020.04.003 [32] 刘伟, 钟灿, 曹文明. 基于数据驱动的路网连续交通流短时预测方法综述[J]. 交通运输工程学报, 2026, 26(2): 24-43. doi: 10.19818/j.cnki.1671-1637.2026.141 LIU Wei, ZHONG Can, CAO Wen-ming. Review of data-driven short-term prediction methods for continuous traffic flow in road networks[J]. Journal of Traffic and Transportation Engineering, 2026, 26(2): 24-43. doi: 10.19818/j.cnki.1671-1637.2026.141 [33] LIN B, GHADDAR B, NATHWANI J. Deep reinforcement learning for the electric vehicle routing problem with time windows[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11528-11538. doi: 10.1109/TITS.2021.3105232 [34] KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![C]//ICLR. 7th International Conference on Learning Representations. Washington DC: ICLR, 2019: 39. [35] REN L, FAN X Y, CUI J, et al. A multi-agent reinforcement learning method with route recorders for vehicle routing in supply chain management[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16410-16420. doi: 10.1109/TITS.2022.3150151 [36] FAN M F, WU Y X, LIAO T J, et al. Deep reinforcement learning for UAV routing in the presence of multiple charging stations[J]. IEEE Transactions on Vehicular Technology, 2023, 72(5): 5732-5746. doi: 10.1109/TVT.2022.3232607 [37] ZHANG K, HE F, ZHANG Z C, et al. Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2020, 121: 102861. doi: 10.1016/j.trc.2020.102861 [38] VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[J]. Advances in Neural Information Processing Systems, 2015, 28: 2692-2700. [39] CALINSKI T, HARABASZ J. A dendrite method for cluster analysis[J]. Communications in Statistics-Simulation and Computation, 1974, 3(1): 791519860. -
下载: