Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning

LIU Xiao-bo; XUANYUAN Jing-yi; XIE Yuan-zhi; ZHENG Fang-fang

doi:10.19818/j.cnki.1671-1637.2026.161

Volume 26 Issue 4

Apr. 2026

Turn off MathJax

Article Contents

Article Navigation > Journal of Traffic and Transportation Engineering > 2026 > 26(4): 15-32

Next Previous

LIU Xiao-bo, XUANYUAN Jing-yi, XIE Yuan-zhi, ZHENG Fang-fang. Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning[J]. Journal of Traffic and Transportation Engineering, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161

Citation:

LIU Xiao-bo, XUANYUAN Jing-yi, XIE Yuan-zhi, ZHENG Fang-fang. Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning[J]. Journal of Traffic and Transportation Engineering, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161

Citation:

LIU Xiao-bo, XUANYUAN Jing-yi, XIE Yuan-zhi, ZHENG Fang-fang. Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning[J]. Journal of Traffic and Transportation Engineering, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161

PDF( 8221 KB)

Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning

doi: 10.19818/j.cnki.1671-1637.2026.161

LIU Xiao-bo^{1, 2
,},
XUANYUAN Jing-yi^{1, 2},
XIE Yuan-zhi^{1, 2},
ZHENG Fang-fang^{1, 2
,
,}

1.
School of Transportation and Logistics, Southwest Jiaotong University, Chengdu 611756, Sichuan, China
2.
National Engineering Laboratory for Applied Technology of Big Data in Integrated Transportation, Southwest Jiaotong University, Chengdu 611756, Sichuan, China

Funds:

Key Program of National Natural Science Foundation of China 52232011

Science and Technology Planning Project of Sichuan Province 2025YFHZ0193

Science and Technology Planning Project of Sichuan Province 2025HJPJ0011

More Information

Corresponding author: ZHENG Fang-fang, professor, PhD, E-mail: fzheng@swjtu.cn
Received Date: 2025-09-04
Accepted Date: 2026-01-23
Rev Recd Date: 2025-12-03
Publish Date: 2026-04-28

Abstract

Abstract

To optimize the multiple unmanned aerial vehicles (multi-UAVs) cooperative traffic monitoring path planning with battery replacement station constraints, a mixed-integer linear programming model based on the UAV team orienteering problem was constructed, and a clustering method was adopted to determine the battery replacement stations' locations to achieve uniform distribution. A multi-agent Transformer-based reinforcement learning (MTRL) algorithm framework was proposed, in which a centralized Transformer architecture was adopted. The encoder was used to learn the global graph-structured representation of the scenario via multi-head attention mechanism, and the decoder was used to generate collaborative path planning. A reward function based on the number of visited target nodes was designed to optimize the UAV visiting sequence and battery replacement strategy. A structured masking mechanism was introduced to eliminate subcircuits, repeated visits, and path conflicts, ensuring solution feasibility. Numerical experiments were conducted in scenarios of 9 types of scale with varying numbers of target nodes, battery replacement stations, and UAVs. The results show that MTRL obtains high-quality feasible solutions in all 9 types of scenarios with stable training convergence. Compared with the commercial solver, the average cumulative reward increases by 9.77%-28.77% in small- and medium-scale scenarios and by 9.34%-14.84% in large-scale scenarios, while that of the genetic algorithm and tabu search decreases by 28%-41% in large-scale scenarios. The inference time remains at the millisecond level. In 18 groups of cross-distribution generalization experiments, the relative error is controlled within 1%. The proposed framework provides an efficient solution for UAV swarm mission planning, intelligent transportation path optimization, and logistics distribution scheduling. In addition, it offers a methodological reference for the application of multi-agent reinforcement learning to complex constrained optimization problems.
- low-altitude traffic,
- unmanned aerial vehicle,
- path optimization,
- deep reinforcement learning,
- team orienteering problem,
- traffic monitoring

FullText(HTML)

References(39)

References

[1]	LIU S, BAI Y B. Multiple UAVs collaborative traffic monitoring with intention-based communication[J]. Computer Communications, 2023, 210: 116-129. doi: 10.1016/j.comcom.2023.08.005
[2]	WANG K, WU Q Q, HE X T, et al. Optimizing UAV traffic monitoring routes during rush hours considering spatiotemporal variation of monitoring demand[J]. International Journal of Geographical Information Science, 2022, 36(10): 2086-2111. doi: 10.1080/13658816.2022.2045605
[3]	COIFMAN B. Improved velocity estimation using single loop detectors[J]. Transportation Research Part A: Policy and Practice, 2001, 35(10): 863-880. doi: 10.1016/S0965-8564(00)00028-8
[4]	KOUTSIA A, SEMERTZIDIS T, DIMITROPOULOS K, et al. Intelligent traffic monitoring and surveillance with multiple cameras[C]//IEEE. 2008 International Workshop on Content-Based Multimedia Indexing. New York: IEEE, 2008: 125-132.
[5]	CAO P, XIONG Z Q, LIU X B. An analytical model for quantifying the efficiency of traffic-data collection using instrumented vehicles[J]. Transportation Research Part C: Emerging Technologies, 2022, 136: 103558. doi: 10.1016/j.trc.2022.103558
[6]	VANDENBERGHE W, VANHAUWAERT E, VERBRUGGE S, et al. Feasibility of expanding traffic monitoring systems with floating car data technology[J]. IET Intelligent Transport Systems, 2012, 6(4): 347-354. doi: 10.1049/iet-its.2011.0221
[7]	SEO T, KUSAKABE T, ASAKURA Y. Estimation of flow and density using probe vehicles with spacing measurement equipment[J]. Transportation Research Part C: Emerging Technologies, 2015, 53: 134-150. doi: 10.1016/j.trc.2015.01.033
[8]	LI X, SHU W, LI M L, et al. Performance evaluation of vehicle-based mobile sensor networks for traffic monitoring[J]. IEEE Transactions on Vehicular Technology, 2009, 58(4): 1647-1653. doi: 10.1109/TVT.2008.2005775
[9]	HUANG P D, CHENG M, CHEN Y P, et al. Traffic sign occlusion detection using mobile laser scanning point clouds[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2364-2376. doi: 10.1109/TITS.2016.2639582
[10]	JIAO J F, WANG H H. Traffic behavior recognition from traffic videos under occlusion condition: A Kalman filter approach[J]. Transportation Research Record : Journal of the Transportation Research Board, 2022, 2676(7): 55-65. doi: 10.1177/03611981221076426
[11]	SONG X G, PI R D, LV C, et al. Augmented multiple vehicles' trajectories extraction under occlusions with roadside LiDAR data[J]. IEEE Sensors Journal, 2021, 21(19): 21921-21930. doi: 10.1109/JSEN.2021.3079257
[12]	ZHAO J X, XU H, ZHANG Y B, et al. Automatic identification of vehicle partial occlusion in data collected by roadside LiDAR sensors[J]. Transportation Research Record: Journal of the Transportation Research Board, 2022, 2676(5): 708-718. doi: 10.1177/03611981211069347
[13]	LI M, ZHEN L, WANG S A, et al. Unmanned aerial vehicle scheduling problem for traffic monitoring[J]. Computers & Industrial Engineering, 2018, 122: 15-23.
[14]	LI S G, YU H K, ZHANG J R, et al. Video-based traffic data collection system for multiple vehicle types[J]. IET Intelligent Transport Systems, 2014, 8(2): 164-174. doi: 10.1049/iet-its.2012.0099
[15]	HUANG H L, SAVKIN A V, HUANG C. Decentralized autonomous navigation of a UAV network for road traffic monitoring[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(4): 2558-2564. doi: 10.1109/TAES.2021.3053115
[16]	MA Qing-lu, WANG Xin-yu, ZHANG Shu, et al. Self-organizing method for traffic coupling between adjacent ramps in intelligent and connected environments[J]. Journal of Traffic and Transportation Engineering, 2024, 24(2): 207-220.
[17]	XIE Ji-ming, XIA Yu-lan, QIAN Zheng-fu, et al. Lane-change risk warning in interweaving area considering information from intelligent connected near-neighboring vehicles[J]. Journal of Traffic and Transportation Engineering, 2023, 23(2): 287-300. doi: 10.19818/j.cnki.1671-1637.2023.02.021
[18]	YAN H, CHEN Y F, YANG S H. UAV-enabled wireless power transfer with base station charging and UAV power consumption[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12883-12896. doi: 10.1109/TVT.2020.3015246
[19]	COELHO B N, COELHO V N, COELHO I M, et al. A multi-objective green UAV routing problem[J]. Computers & Operations Research, 2017, 88: 306-315.
[20]	XU W Z, XU Z C, PENG J, et al. Approximation algorithms for the team orienteering problem[C]//IEEE. INFOCOM 2020 -IEEE Conference on Computer Communications. New York: IEEE, 2020: 1389-1398.
[21]	JUAN A A, MARUGAN C A, AHSINI Y, et al. Using reinforcement learning to solve a dynamic orienteering problem with random rewards affected by the battery status[J]. Batteries, 2023, 9(8): 416. doi: 10.3390/batteries9080416
[22]	AMMOURIOVA M, GUERRERO A, TSERTSVADZE V, et al. Using reinforcement learning in a dynamic team orienteering problem with electric batteries[J]. Batteries, 2024, 10(12): 411. doi: 10.3390/batteries10120411
[23]	LEE J J, RATHINAM S. Team orienteering and scheduling algorithms for collaborative UAV-UGV area coverage with battery constraints[C]//IEEE. 2025 International Conference on Unmanned Aircraft Systems (ICUAS). New York: IEEE, 2025: 625-632.
[24]	QIN Wen-long, LUO He, LI Xiao-duo, et al. Multi-UAV emergency power inspection path planning method considering multiple charging stations[J]. Control and Decision, 2025, 40(8): 2391-2399.
[25]	FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. Solving routing problems for multiple cooperative Unmanned Aerial Vehicles using Transformer networks[J]. Engineering Applications of Artificial Intelligence, 2023, 122: 106085. doi: 10.1016/j.engappai.2023.106085
[26]	NOVOA C, STORER R. An approximate dynamic programming approach for the vehicle routing problem with stochastic demands[J]. European Journal of Operational Research, 2009, 196(2): 509-515. doi: 10.1016/j.ejor.2008.03.023
[27]	KIRÁLY A, ABONYI J. Redesign of the supply of mobile mechanics based on a novel genetic optimization algorithm using Google Maps API[J]. Engineering Applications of Artificial Intelligence, 2015, 38: 122-130. doi: 10.1016/j.engappai.2014.10.015
[28]	FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. TOP-former: A multi-agent transformer approach for the team orienteering problem[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(9): 13799-13810. doi: 10.1109/TITS.2025.3566157
[29]	BAI L H, ZHENG F F, HOU K N, et al. Longitudinal control of automated vehicles: A novel approach by integrating deep reinforcement learning with intelligent driver model[J]. IEEE Transactions on Vehicular Technology, 2024, 73(8): 11014-11028. doi: 10.1109/TVT.2024.3376599
[30]	ZHANG Hong-hai, YI Jia, LI Shan, et al. Review on research of low-altitude airspace capacity evaluation[J]. Journal of Traffic and Transportation Engineering, 2023, 23(6): 78-93. doi: 10.19818/j.cnki.1671-1637.2023.06.003
[31]	LI Cheng-long, QU Wen-qiu, LI Yan-dong, et al. Overview of traffic management of urban air mobility (UAM)with eVTOL aircraft[J]. Journal of Traffic and Transportation Engineering, 2020, 20(4): 35-54. doi: 10.19818/j.cnki.1671-1637.2020.04.003
[32]	LIU Wei, ZHONG Can, CAO Wen-ming. Review of data-driven short-term prediction methods for continuous traffic flow in road networks[J]. Journal of Traffic and Transportation Engineering, 2026, 26(2): 24-43. doi: 10.19818/j.cnki.1671-1637.2026.141
[33]	LIN B, GHADDAR B, NATHWANI J. Deep reinforcement learning for the electric vehicle routing problem with time windows[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11528-11538. doi: 10.1109/TITS.2021.3105232
[34]	KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![C]//ICLR. 7th International Conference on Learning Representations. Washington DC: ICLR, 2019: 39.
[35]	REN L, FAN X Y, CUI J, et al. A multi-agent reinforcement learning method with route recorders for vehicle routing in supply chain management[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16410-16420. doi: 10.1109/TITS.2022.3150151
[36]	FAN M F, WU Y X, LIAO T J, et al. Deep reinforcement learning for UAV routing in the presence of multiple charging stations[J]. IEEE Transactions on Vehicular Technology, 2023, 72(5): 5732-5746. doi: 10.1109/TVT.2022.3232607
[37]	ZHANG K, HE F, ZHANG Z C, et al. Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2020, 121: 102861. doi: 10.1016/j.trc.2020.102861
[38]	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[J]. Advances in Neural Information Processing Systems, 2015, 28: 2692-2700.
[39]	CALINSKI T, HARABASZ J. A dendrite method for cluster analysis[J]. Communications in Statistics-Simulation and Computation, 1974, 3(1): 791519860.

Relative Articles

Supplements(0)

Cited By

Get Citation

PDF

XML

Article Metrics

Article views (566) PDF downloads(88)

Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning

doi: 10.19818/j.cnki.1671-1637.2026.161

Abstract

References

Catalog

Article Metrics

Related

Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning

doi: 10.19818/j.cnki.1671-1637.2026.161

Abstract

References

Catalog

Article Metrics

Related

Export File

Citation

Format

Content