基于多智能体强化学习的多无人机协同交通监控路径优化

刘晓波; 轩辕婧祎; 谢沅志; 郑芳芳

doi:10.19818/j.cnki.1671-1637.2026.161

基于多智能体强化学习的多无人机协同交通监控路径优化

doi: 10.19818/j.cnki.1671-1637.2026.161

刘晓波^{1, 2,},
轩辕婧祎^{1, 2},
谢沅志^{1, 2},
郑芳芳^{1, 2, ,}

1.
西南交通大学交通运输与物流学院, 四川成都 611756
2.
西南交通大学综合交通大数据应用技术国家工程实验室, 四川成都 611756

基金项目:

国家自然科学基金重点项目 52232011

四川省科技计划项目 2025YFHZ0193

四川省科技计划项目 2025HJPJ0011

详细信息

作者简介:
刘晓波（1974-），男，四川成都人，教授，工学博士，E-mail：xiaobo.liu@swjtu.cn

通讯作者:
郑芳芳（1981-），女，浙江浦江人，教授，工学博士，E-mail：fzheng@swjtu.cn

中图分类号: U8
计量
- 文章访问数: 184
- HTML全文浏览量: 72
- PDF下载量: 50
- 被引次数: 0
出版历程
- 收稿日期: 2025-09-04
- 录用日期: 2026-01-23
- 修回日期: 2025-12-03
- 刊出日期: 2026-04-28

Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning

LIU Xiao-bo^{1, 2
,},
XUANYUAN Jing-yi^{1, 2},
XIE Yuan-zhi^{1, 2},
ZHENG Fang-fang^{1, 2
, ,}

1.
School of Transportation and Logistics, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
2.
National Engineering Laboratory for Applied Technology of Big Data in Integrated Transportation, Southwest Jiaotong University, Chengdu 611756, Sichuan, China

Funds:

Key Program of National Natural Science Foundation of China 52232011

Science and Technology Planning Project of Sichuan Province 2025YFHZ0193

Science and Technology Planning Project of Sichuan Province 2025HJPJ0011

More Information

Corresponding author: ZHENG Fang-fang, professor, PhD, E-mail: fzheng@swjtu.cn

Article Text (Baidu Translation)

摘要

摘要: 为优化含电池更换站约束的多无人机协同交通监控路径规划问题，构建了基于无人机团队定向问题的混合整数线性规划模型，采用聚类方法确定电池更换站选址，使其分布均匀；提出了基于多智能体Transformer的强化学习（MTRL）算法框架，采用集中式Transformer架构，编码器利用多头注意力机制学习场景图结构全局表示，解码器生成协作式路径决策；设计了基于目标点访问数量的奖励函数，优化无人机访问序列与电池更换策略；引入结构化掩码机制，消除子回路、重复访问及路径冲突，保证解的可行性；在不同目标点数量、电池更换站数量和无人机数量的9类规模场景下开展数值试验。试验结果表明：MTRL在全部9类场景中均能获得高质量可行解，训练收敛稳定；与商业求解器相比，小、中规模场景下平均累计奖励提升9.77%~28.77%，大规模场景下提升9.34%~14.84%，而遗传算法、禁忌搜索在大规模场景下较商业求解器下降28%~41%；推理时间保持毫秒级；在18组跨分布泛化试验中，相对误差均控制在1%以内。所提框架可为无人机集群任务规划、智能交通路径优化及物流配送调度提供高效求解方案，并为多智能体强化学习在复杂约束优化问题中的应用提供方法参考。
- 低空交通 /
- 无人机 /
- 路径优化 /
- 深度强化学习 /
- 团队定向问题 /
- 交通监控
Abstract: To optimize the multiple unmanned aerial vehicles (multi-UAVs) cooperative traffic monitoring path planning with battery replacement station constraints, a mixed-integer linear programming model based on the UAV team orienteering problem was constructed, and a clustering method was adopted to determine the battery replacement stations' locations to achieve uniform distribution. A multi-agent Transformer-based reinforcement learning (MTRL) algorithm framework was proposed, in which a centralized Transformer architecture was adopted. The encoder was used to learn the global graph-structured representation of the scenario via multi-head attention mechanism, and the decoder was used to generate collaborative path planning. A reward function based on the number of visited target nodes was designed to optimize the UAV visiting sequence and battery replacement strategy. A structured masking mechanism was introduced to eliminate subcircuits, repeated visits, and path conflicts, ensuring solution feasibility. Numerical experiments were conducted in scenarios of 9 types of scale with varying numbers of target nodes, battery replacement stations, and UAVs. The results show that MTRL obtains high-quality feasible solutions in all 9 types of scenarios with stable training convergence. Compared with the commercial solver, the average cumulative reward increases by 9.77%-28.77% in small- and medium-scale scenarios and by 9.34%-14.84% in large-scale scenarios, while that of the genetic algorithm and tabu search decreases by 28%-41% in large-scale scenarios. The inference time remains at the millisecond level. In 18 groups of cross-distribution generalization experiments, the relative error is controlled within 1%. The proposed framework provides an efficient solution for UAV swarm mission planning, intelligent transportation path optimization, and logistics distribution scheduling. In addition, it offers a methodological reference for the application of multi-agent reinforcement learning to complex constrained optimization problems.
- low-altitude traffic /
- unmanned aerial vehicle /
- path optimization /
- deep reinforcement learning /
- team orienteering problem /
- traffic monitoring

HTML全文

图 1 含电池更换站的无人机协同交通监控路径示例

Figure 1. Example paths for cooperative UAV traffic monitoring with battery replacement stations

下载: 全尺寸图片幻灯片

图 2 MTRL模型架构

Figure 2. Architecture of MTRL model

下载: 全尺寸图片幻灯片

图 3 多头注意力机制架构

Figure 3. Architecture of multi-head attention mechanism

下载: 全尺寸图片幻灯片

图 4 五种聚类算法性能对比

Figure 4. Comparison of performances among five clustering algorithms

下载: 全尺寸图片幻灯片

图 5 电池更换站选址结果

Figure 5. Location results of battery replacement stations

下载: 全尺寸图片幻灯片

图 6 MTRL模型训练奖励曲线

Figure 6. Training reward curves of MTRL model

下载: 全尺寸图片幻灯片

图 7 多无人机路径规划结果可视化示例

Figure 7. Visualization example of multi-UAV path planning results

下载: 全尺寸图片幻灯片

图 8 城市道路交通监控结果

Figure 8. Results of urban road traffic control

下载: 全尺寸图片幻灯片

表 1 掩码含义说明

Table 1. Explanation of masks

掩码符号	掩码名称	掩码含义
$ \boldsymbol{\mathcal{M}}_{1} $	重复性约束	已被任意无人机访问过的目标点被掩码
$ \boldsymbol{\mathcal{M}}_{2} $	能量约束	到达下一个目标点后剩余电量不够返回基地的目标点被掩码
$ \boldsymbol{\mathcal{M}}_{3} $	累计时间	到达下一个目标点并返回基地的累计时间超过最大任务时限的目标点被掩码
$ \boldsymbol{\mathcal{M}}_{4} $	电池更换站连续访问约束	从电池更换站出发时所有电池更换站被掩码
$ \boldsymbol{\mathcal{M}}_{5} $	电量安全约束	到达下一个目标点后剩余电量不够到达至少一个电池更换站的目标点被掩码
$ \boldsymbol{\mathcal{M}}_{6} $	基地访问规则	起始基地始终被掩码

下载: 导出CSV

表 2 不同问题规模计算结果比较

Table 2. Comparison of calculation results for problems of different sizes

算法		T20			T50			T100
算法		目标值（累计奖励）	提升比例/%	计算时间/s	目标值（累计奖励）	提升比例/%	计算时间/s	目标值（累计奖励）	提升比例/%	计算时间/s
A2	CPLEX	18.215 6	0.00	60.372 4	27.260 0	0.00	107.120 0	41.684 3	0.00	118.763 5
	TS	18.011 7	-1.12	20.271 1	23.179 7	-14.97	27.949 3	24.750 0	-40.63	39.259 9
	GA	18.050 8	-0.90	2.768 0	23.773 4	-12.80	4.030 0	26.238 2	-37.05	5.773 3
	MTRL	19.996 1	9.77	0.003 4	35.101 6	28.77	0.006 7	47.871 8	14.84	0.008 2
	PNRL	19.972 7	9.65	0.006 8	33.738 3	23.76	0.012 3	46.872 3	12.45	0.009 6
A3	CPLEX	16.639 2	0.00	112.956 1	37.726 3	0.00	106.723 9	52.763 2	0.00	117.763 8
	TS	19.996 1	20.17	19.625 0	32.140 6	-14.81	32.057 9	35.543 0	-32.64	44.795 7
	GA	20.000 0	20.20	3.080 1	32.058 6	-15.02	5.224 3	36.878 9	-30.10	7.763 9
	MTRL	20.000 0	20.20	0.006 1	43.750 0	15.97	0.005 7	60.518 1	14.70	0.009 3
	PNRL	19.991 2	20.15	0.007 3	42.339 8	12.23	0.012 3	58.375 9	10.64	0.011 6
A4	CPLEX	16.284 7	0.00	117.485 2	43.681 7	0.00	114.608 3	64.394 6	0.00	118.265 4
	TS	20.000 0	22.81	18.468 8	39.484 4	-9.61	35.205 9	45.058 6	-30.03	51.798 3
	GA	20.000 0	22.81	3.269 0	39.269 5	-10.10	6.441 4	46.183 5	-28.28	9.655 3
	MTRL	20.000 0	22.81	0.004 0	49.539 1	13.41	0.006 1	71.026 7	9.34	0.009 8
	PNRL	19.982 6	22.71	0.008 7	48.668 0	11.42	0.009 5	70.635 7	9.70	0.012 4

下载: 导出CSV

表 3 不同节点分布模式的性能对比

Table 3. Performance comparison of different node distribution modes

问题规模	$ {M}_{\mathrm{u}}\to {T}_{\mathrm{u}} $	$ {M}_{\mathrm{u}}\to {T}_{\mathrm{n}} $	$ {\eta }_{\mathrm{u}} $/%	$ {M}_{\mathrm{n}}\to {T}_{\mathrm{u}} $	$ {M}_{\mathrm{n}}\to {T}_{\mathrm{n}} $	$ {\eta }_{\mathrm{n}} $/%
T20B2A2	19.996 1	19.980 5	0.078 0	19.925 8	19.945 3	0.097 8
T20B2A3	20.000 0	20.000 0	0.000 0	20.000 0	20.000 0	0.000 0
T20B2A4	20.000 0	20.000 0	0.000 0	20.000 0	20.000 0	0.000 0
T50B5A2	35.101 6	34.906 2	0.556 7	34.207 0	34.105 5	0.297 6
T50B5A3	43.750 0	43.664 1	0.196 3	43.363 3	43.183 5	0.416 3
T50B5A4	49.539 1	49.441 4	0.197 2	49.164 1	49.087 4	0.156 3
T100B10A2	47.871 8	47.593 8	0.580 7	47.085 7	46.872 3	0.455 3
T100B10A3	60.518 1	60.257 5	0.430 6	58.473 4	58.375 9	0.167 0
T100B10A4	71.026 7	70.753 5	0.384 6	71.009 8	70.635 7	0.529 6

下载: 导出CSV

参考文献(39)

[1]	LIU S, BAI Y B. Multiple UAVs collaborative traffic monitoring with intention-based communication[J]. Computer Communications, 2023, 210: 116-129. doi: 10.1016/j.comcom.2023.08.005
[2]	WANG K, WU Q Q, HE X T, et al. Optimizing UAV traffic monitoring routes during rush hours considering spatiotemporal variation of monitoring demand[J]. International Journal of Geographical Information Science, 2022, 36(10): 2086-2111. doi: 10.1080/13658816.2022.2045605
[3]	COIFMAN B. Improved velocity estimation using single loop detectors[J]. Transportation Research Part A: Policy and Practice, 2001, 35(10): 863-880. doi: 10.1016/S0965-8564(00)00028-8
[4]	KOUTSIA A, SEMERTZIDIS T, DIMITROPOULOS K, et al. Intelligent traffic monitoring and surveillance with multiple cameras[C]//IEEE. 2008 International Workshop on Content-Based Multimedia Indexing. New York: IEEE, 2008: 125-132.
[5]	CAO P, XIONG Z Q, LIU X B. An analytical model for quantifying the efficiency of traffic-data collection using instrumented vehicles[J]. Transportation Research Part C: Emerging Technologies, 2022, 136: 103558. doi: 10.1016/j.trc.2022.103558
[6]	VANDENBERGHE W, VANHAUWAERT E, VERBRUGGE S, et al. Feasibility of expanding traffic monitoring systems with floating car data technology[J]. IET Intelligent Transport Systems, 2012, 6(4): 347-354. doi: 10.1049/iet-its.2011.0221
[7]	SEO T, KUSAKABE T, ASAKURA Y. Estimation of flow and density using probe vehicles with spacing measurement equipment[J]. Transportation Research Part C: Emerging Technologies, 2015, 53: 134-150. doi: 10.1016/j.trc.2015.01.033
[8]	LI X, SHU W, LI M L, et al. Performance evaluation of vehicle-based mobile sensor networks for traffic monitoring[J]. IEEE Transactions on Vehicular Technology, 2009, 58(4): 1647-1653. doi: 10.1109/TVT.2008.2005775
[9]	HUANG P D, CHENG M, CHEN Y P, et al. Traffic sign occlusion detection using mobile laser scanning point clouds[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2364-2376. doi: 10.1109/TITS.2016.2639582
[10]	JIAO J F, WANG H H. Traffic behavior recognition from traffic videos under occlusion condition: A Kalman filter approach[J]. Transportation Research Record : Journal of the Transportation Research Board, 2022, 2676(7): 55-65. doi: 10.1177/03611981221076426
[11]	SONG X G, PI R D, LV C, et al. Augmented multiple vehicles' trajectories extraction under occlusions with roadside LiDAR data[J]. IEEE Sensors Journal, 2021, 21(19): 21921-21930. doi: 10.1109/JSEN.2021.3079257
[12]	ZHAO J X, XU H, ZHANG Y B, et al. Automatic identification of vehicle partial occlusion in data collected by roadside LiDAR sensors[J]. Transportation Research Record: Journal of the Transportation Research Board, 2022, 2676(5): 708-718. doi: 10.1177/03611981211069347
[13]	LI M, ZHEN L, WANG S A, et al. Unmanned aerial vehicle scheduling problem for traffic monitoring[J]. Computers & Industrial Engineering, 2018, 122: 15-23.
[14]	LI S G, YU H K, ZHANG J R, et al. Video-based traffic data collection system for multiple vehicle types[J]. IET Intelligent Transport Systems, 2014, 8(2): 164-174. doi: 10.1049/iet-its.2012.0099
[15]	HUANG H L, SAVKIN A V, HUANG C. Decentralized autonomous navigation of a UAV network for road traffic monitoring[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(4): 2558-2564. doi: 10.1109/TAES.2021.3053115
[16]	马庆禄, 王欣宇, 张书, 等. 智能网联环境下近邻匝道交通耦合自组织方法[J]. 交通运输工程学报, 2024, 24(2): 207-220. MA Qing-lu, WANG Xin-yu, ZHANG Shu, et al. Self-organizing method for traffic coupling between adjacent ramps in intelligent and connected environments[J]. Journal of Traffic and Transportation Engineering, 2024, 24(2): 207-220.
[17]	谢济铭, 夏玉兰, 钱正富, 等. 考虑智能网联近邻车辆信息的交织区换道风险预警[J]. 交通运输工程学报, 2023, 23(2): 287-300. doi: 10.19818/j.cnki.1671-1637.2023.02.021 XIE Ji-ming, XIA Yu-lan, QIAN Zheng-fu, et al. Lane-change risk warning in interweaving area considering information from intelligent connected near-neighboring vehicles[J]. Journal of Traffic and Transportation Engineering, 2023, 23(2): 287-300. doi: 10.19818/j.cnki.1671-1637.2023.02.021
[18]	YAN H, CHEN Y F, YANG S H. UAV-enabled wireless power transfer with base station charging and UAV power consumption[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12883-12896. doi: 10.1109/TVT.2020.3015246
[19]	COELHO B N, COELHO V N, COELHO I M, et al. A multi-objective green UAV routing problem[J]. Computers & Operations Research, 2017, 88: 306-315.
[20]	XU W Z, XU Z C, PENG J, et al. Approximation algorithms for the team orienteering problem[C]//IEEE. INFOCOM 2020 -IEEE Conference on Computer Communications. New York: IEEE, 2020: 1389-1398.
[21]	JUAN A A, MARUGAN C A, AHSINI Y, et al. Using reinforcement learning to solve a dynamic orienteering problem with random rewards affected by the battery status[J]. Batteries, 2023, 9(8): 416. doi: 10.3390/batteries9080416
[22]	AMMOURIOVA M, GUERRERO A, TSERTSVADZE V, et al. Using reinforcement learning in a dynamic team orienteering problem with electric batteries[J]. Batteries, 2024, 10(12): 411. doi: 10.3390/batteries10120411
[23]	LEE J J, RATHINAM S. Team orienteering and scheduling algorithms for collaborative UAV-UGV area coverage with battery constraints[C]//IEEE. 2025 International Conference on Unmanned Aircraft Systems (ICUAS). New York: IEEE, 2025: 625-632.
[24]	秦文龙, 罗贺, 李晓多, 等. 考虑多换电站的多无人机应急电力巡检路径规划方法[J]. 控制与决策, 2025, 40(8): 2391-2399. QIN Wen-long, LUO He, LI Xiao-duo, et al. Multi-UAV emergency power inspection path planning method considering multiple charging stations[J]. Control and Decision, 2025, 40(8): 2391-2399.
[25]	FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. Solving routing problems for multiple cooperative Unmanned Aerial Vehicles using Transformer networks[J]. Engineering Applications of Artificial Intelligence, 2023, 122: 106085. doi: 10.1016/j.engappai.2023.106085
[26]	NOVOA C, STORER R. An approximate dynamic programming approach for the vehicle routing problem with stochastic demands[J]. European Journal of Operational Research, 2009, 196(2): 509-515. doi: 10.1016/j.ejor.2008.03.023
[27]	KIRÁLY A, ABONYI J. Redesign of the supply of mobile mechanics based on a novel genetic optimization algorithm using Google Maps API[J]. Engineering Applications of Artificial Intelligence, 2015, 38: 122-130. doi: 10.1016/j.engappai.2014.10.015
[28]	FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. TOP-former: A multi-agent transformer approach for the team orienteering problem[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(9): 13799-13810. doi: 10.1109/TITS.2025.3566157
[29]	BAI L H, ZHENG F F, HOU K N, et al. Longitudinal control of automated vehicles: A novel approach by integrating deep reinforcement learning with intelligent driver model[J]. IEEE Transactions on Vehicular Technology, 2024, 73(8): 11014-11028. doi: 10.1109/TVT.2024.3376599
[30]	张洪海, 夷珈, 李姗, 等. 低空空域容量评估研究综述[J]. 交通运输工程学报, 2023, 23(6): 78-93. doi: 10.19818/j.cnki.1671-1637.2023.06.003 ZHANG Hong-hai, YI Jia, LI Shan, et al. Review on research of low-altitude airspace capacity evaluation[J]. Journal of Traffic and Transportation Engineering, 2023, 23(6): 78-93. doi: 10.19818/j.cnki.1671-1637.2023.06.003
[31]	李诚龙, 屈文秋, 李彦冬, 等. 面向eVTOL航空器的城市空中运输交通管理综述[J]. 交通运输工程学报, 2020, 20(4): 35-54. doi: 10.19818/j.cnki.1671-1637.2020.04.003 LI Cheng-long, QU Wen-qiu, LI Yan-dong, et al. Overview of traffic management of urban air mobility (UAM)with eVTOL aircraft[J]. Journal of Traffic and Transportation Engineering, 2020, 20(4): 35-54. doi: 10.19818/j.cnki.1671-1637.2020.04.003
[32]	刘伟, 钟灿, 曹文明. 基于数据驱动的路网连续交通流短时预测方法综述[J]. 交通运输工程学报, 2026, 26(2): 24-43. doi: 10.19818/j.cnki.1671-1637.2026.141 LIU Wei, ZHONG Can, CAO Wen-ming. Review of data-driven short-term prediction methods for continuous traffic flow in road networks[J]. Journal of Traffic and Transportation Engineering, 2026, 26(2): 24-43. doi: 10.19818/j.cnki.1671-1637.2026.141
[33]	LIN B, GHADDAR B, NATHWANI J. Deep reinforcement learning for the electric vehicle routing problem with time windows[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11528-11538. doi: 10.1109/TITS.2021.3105232
[34]	KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![C]//ICLR. 7th International Conference on Learning Representations. Washington DC: ICLR, 2019: 39.
[35]	REN L, FAN X Y, CUI J, et al. A multi-agent reinforcement learning method with route recorders for vehicle routing in supply chain management[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16410-16420. doi: 10.1109/TITS.2022.3150151
[36]	FAN M F, WU Y X, LIAO T J, et al. Deep reinforcement learning for UAV routing in the presence of multiple charging stations[J]. IEEE Transactions on Vehicular Technology, 2023, 72(5): 5732-5746. doi: 10.1109/TVT.2022.3232607
[37]	ZHANG K, HE F, ZHANG Z C, et al. Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2020, 121: 102861. doi: 10.1016/j.trc.2020.102861
[38]	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[J]. Advances in Neural Information Processing Systems, 2015, 28: 2692-2700.
[39]	CALINSKI T, HARABASZ J. A dendrite method for cluster analysis[J]. Communications in Statistics-Simulation and Computation, 1974, 3(1): 791519860.