Volume 26 Issue 4
Apr.  2026
Turn off MathJax
Article Contents
LIU Xiao-bo, XUANYUAN Jing-yi, XIE Yuan-zhi, ZHENG Fang-fang. Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning[J]. Journal of Traffic and Transportation Engineering, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161
Citation: LIU Xiao-bo, XUANYUAN Jing-yi, XIE Yuan-zhi, ZHENG Fang-fang. Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning[J]. Journal of Traffic and Transportation Engineering, 2026, 26(4): 15-32. doi: 10.19818/j.cnki.1671-1637.2026.161

Cooperative traffic monitoring path optimization for multiple unmanned aerial vehicles based on multi-agent reinforcement learning

doi: 10.19818/j.cnki.1671-1637.2026.161
Funds:

Key Program of National Natural Science Foundation of China 52232011

Science and Technology Planning Project of Sichuan Province 2025YFHZ0193

Science and Technology Planning Project of Sichuan Province 2025HJPJ0011

More Information
  • Corresponding author: ZHENG Fang-fang, professor, PhD, E-mail: fzheng@swjtu.cn
  • Received Date: 2025-09-04
  • Accepted Date: 2026-01-23
  • Rev Recd Date: 2025-12-03
  • Publish Date: 2026-04-28
  • To optimize the multiple unmanned aerial vehicles (multi-UAVs) cooperative traffic monitoring path planning with battery replacement station constraints, a mixed-integer linear programming model based on the UAV team orienteering problem was constructed, and a clustering method was adopted to determine the battery replacement stations' locations to achieve uniform distribution. A multi-agent Transformer-based reinforcement learning (MTRL) algorithm framework was proposed, in which a centralized Transformer architecture was adopted. The encoder was used to learn the global graph-structured representation of the scenario via multi-head attention mechanism, and the decoder was used to generate collaborative path planning. A reward function based on the number of visited target nodes was designed to optimize the UAV visiting sequence and battery replacement strategy. A structured masking mechanism was introduced to eliminate subcircuits, repeated visits, and path conflicts, ensuring solution feasibility. Numerical experiments were conducted in scenarios of 9 types of scale with varying numbers of target nodes, battery replacement stations, and UAVs. The results show that MTRL obtains high-quality feasible solutions in all 9 types of scenarios with stable training convergence. Compared with the commercial solver, the average cumulative reward increases by 9.77%-28.77% in small- and medium-scale scenarios and by 9.34%-14.84% in large-scale scenarios, while that of the genetic algorithm and tabu search decreases by 28%-41% in large-scale scenarios. The inference time remains at the millisecond level. In 18 groups of cross-distribution generalization experiments, the relative error is controlled within 1%. The proposed framework provides an efficient solution for UAV swarm mission planning, intelligent transportation path optimization, and logistics distribution scheduling. In addition, it offers a methodological reference for the application of multi-agent reinforcement learning to complex constrained optimization problems.

     

  • loading
  • [1]
    LIU S, BAI Y B. Multiple UAVs collaborative traffic monitoring with intention-based communication[J]. Computer Communications, 2023, 210: 116-129. doi: 10.1016/j.comcom.2023.08.005
    [2]
    WANG K, WU Q Q, HE X T, et al. Optimizing UAV traffic monitoring routes during rush hours considering spatiotemporal variation of monitoring demand[J]. International Journal of Geographical Information Science, 2022, 36(10): 2086-2111. doi: 10.1080/13658816.2022.2045605
    [3]
    COIFMAN B. Improved velocity estimation using single loop detectors[J]. Transportation Research Part A: Policy and Practice, 2001, 35(10): 863-880. doi: 10.1016/S0965-8564(00)00028-8
    [4]
    KOUTSIA A, SEMERTZIDIS T, DIMITROPOULOS K, et al. Intelligent traffic monitoring and surveillance with multiple cameras[C]//IEEE. 2008 International Workshop on Content-Based Multimedia Indexing. New York: IEEE, 2008: 125-132.
    [5]
    CAO P, XIONG Z Q, LIU X B. An analytical model for quantifying the efficiency of traffic-data collection using instrumented vehicles[J]. Transportation Research Part C: Emerging Technologies, 2022, 136: 103558. doi: 10.1016/j.trc.2022.103558
    [6]
    VANDENBERGHE W, VANHAUWAERT E, VERBRUGGE S, et al. Feasibility of expanding traffic monitoring systems with floating car data technology[J]. IET Intelligent Transport Systems, 2012, 6(4): 347-354. doi: 10.1049/iet-its.2011.0221
    [7]
    SEO T, KUSAKABE T, ASAKURA Y. Estimation of flow and density using probe vehicles with spacing measurement equipment[J]. Transportation Research Part C: Emerging Technologies, 2015, 53: 134-150. doi: 10.1016/j.trc.2015.01.033
    [8]
    LI X, SHU W, LI M L, et al. Performance evaluation of vehicle-based mobile sensor networks for traffic monitoring[J]. IEEE Transactions on Vehicular Technology, 2009, 58(4): 1647-1653. doi: 10.1109/TVT.2008.2005775
    [9]
    HUANG P D, CHENG M, CHEN Y P, et al. Traffic sign occlusion detection using mobile laser scanning point clouds[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2364-2376. doi: 10.1109/TITS.2016.2639582
    [10]
    JIAO J F, WANG H H. Traffic behavior recognition from traffic videos under occlusion condition: A Kalman filter approach[J]. Transportation Research Record : Journal of the Transportation Research Board, 2022, 2676(7): 55-65. doi: 10.1177/03611981221076426
    [11]
    SONG X G, PI R D, LV C, et al. Augmented multiple vehicles' trajectories extraction under occlusions with roadside LiDAR data[J]. IEEE Sensors Journal, 2021, 21(19): 21921-21930. doi: 10.1109/JSEN.2021.3079257
    [12]
    ZHAO J X, XU H, ZHANG Y B, et al. Automatic identification of vehicle partial occlusion in data collected by roadside LiDAR sensors[J]. Transportation Research Record: Journal of the Transportation Research Board, 2022, 2676(5): 708-718. doi: 10.1177/03611981211069347
    [13]
    LI M, ZHEN L, WANG S A, et al. Unmanned aerial vehicle scheduling problem for traffic monitoring[J]. Computers & Industrial Engineering, 2018, 122: 15-23.
    [14]
    LI S G, YU H K, ZHANG J R, et al. Video-based traffic data collection system for multiple vehicle types[J]. IET Intelligent Transport Systems, 2014, 8(2): 164-174. doi: 10.1049/iet-its.2012.0099
    [15]
    HUANG H L, SAVKIN A V, HUANG C. Decentralized autonomous navigation of a UAV network for road traffic monitoring[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(4): 2558-2564. doi: 10.1109/TAES.2021.3053115
    [16]
    MA Qing-lu, WANG Xin-yu, ZHANG Shu, et al. Self-organizing method for traffic coupling between adjacent ramps in intelligent and connected environments[J]. Journal of Traffic and Transportation Engineering, 2024, 24(2): 207-220.
    [17]
    XIE Ji-ming, XIA Yu-lan, QIAN Zheng-fu, et al. Lane-change risk warning in interweaving area considering information from intelligent connected near-neighboring vehicles[J]. Journal of Traffic and Transportation Engineering, 2023, 23(2): 287-300. doi: 10.19818/j.cnki.1671-1637.2023.02.021
    [18]
    YAN H, CHEN Y F, YANG S H. UAV-enabled wireless power transfer with base station charging and UAV power consumption[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12883-12896. doi: 10.1109/TVT.2020.3015246
    [19]
    COELHO B N, COELHO V N, COELHO I M, et al. A multi-objective green UAV routing problem[J]. Computers & Operations Research, 2017, 88: 306-315.
    [20]
    XU W Z, XU Z C, PENG J, et al. Approximation algorithms for the team orienteering problem[C]//IEEE. INFOCOM 2020 -IEEE Conference on Computer Communications. New York: IEEE, 2020: 1389-1398.
    [21]
    JUAN A A, MARUGAN C A, AHSINI Y, et al. Using reinforcement learning to solve a dynamic orienteering problem with random rewards affected by the battery status[J]. Batteries, 2023, 9(8): 416. doi: 10.3390/batteries9080416
    [22]
    AMMOURIOVA M, GUERRERO A, TSERTSVADZE V, et al. Using reinforcement learning in a dynamic team orienteering problem with electric batteries[J]. Batteries, 2024, 10(12): 411. doi: 10.3390/batteries10120411
    [23]
    LEE J J, RATHINAM S. Team orienteering and scheduling algorithms for collaborative UAV-UGV area coverage with battery constraints[C]//IEEE. 2025 International Conference on Unmanned Aircraft Systems (ICUAS). New York: IEEE, 2025: 625-632.
    [24]
    QIN Wen-long, LUO He, LI Xiao-duo, et al. Multi-UAV emergency power inspection path planning method considering multiple charging stations[J]. Control and Decision, 2025, 40(8): 2391-2399.
    [25]
    FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. Solving routing problems for multiple cooperative Unmanned Aerial Vehicles using Transformer networks[J]. Engineering Applications of Artificial Intelligence, 2023, 122: 106085. doi: 10.1016/j.engappai.2023.106085
    [26]
    NOVOA C, STORER R. An approximate dynamic programming approach for the vehicle routing problem with stochastic demands[J]. European Journal of Operational Research, 2009, 196(2): 509-515. doi: 10.1016/j.ejor.2008.03.023
    [27]
    KIRÁLY A, ABONYI J. Redesign of the supply of mobile mechanics based on a novel genetic optimization algorithm using Google Maps API[J]. Engineering Applications of Artificial Intelligence, 2015, 38: 122-130. doi: 10.1016/j.engappai.2014.10.015
    [28]
    FUERTES D, DEL-BLANCO C R, JAUREGUIZAR F, et al. TOP-former: A multi-agent transformer approach for the team orienteering problem[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(9): 13799-13810. doi: 10.1109/TITS.2025.3566157
    [29]
    BAI L H, ZHENG F F, HOU K N, et al. Longitudinal control of automated vehicles: A novel approach by integrating deep reinforcement learning with intelligent driver model[J]. IEEE Transactions on Vehicular Technology, 2024, 73(8): 11014-11028. doi: 10.1109/TVT.2024.3376599
    [30]
    ZHANG Hong-hai, YI Jia, LI Shan, et al. Review on research of low-altitude airspace capacity evaluation[J]. Journal of Traffic and Transportation Engineering, 2023, 23(6): 78-93. doi: 10.19818/j.cnki.1671-1637.2023.06.003
    [31]
    LI Cheng-long, QU Wen-qiu, LI Yan-dong, et al. Overview of traffic management of urban air mobility (UAM)with eVTOL aircraft[J]. Journal of Traffic and Transportation Engineering, 2020, 20(4): 35-54. doi: 10.19818/j.cnki.1671-1637.2020.04.003
    [32]
    LIU Wei, ZHONG Can, CAO Wen-ming. Review of data-driven short-term prediction methods for continuous traffic flow in road networks[J]. Journal of Traffic and Transportation Engineering, 2026, 26(2): 24-43. doi: 10.19818/j.cnki.1671-1637.2026.141
    [33]
    LIN B, GHADDAR B, NATHWANI J. Deep reinforcement learning for the electric vehicle routing problem with time windows[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11528-11538. doi: 10.1109/TITS.2021.3105232
    [34]
    KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![C]//ICLR. 7th International Conference on Learning Representations. Washington DC: ICLR, 2019: 39.
    [35]
    REN L, FAN X Y, CUI J, et al. A multi-agent reinforcement learning method with route recorders for vehicle routing in supply chain management[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16410-16420. doi: 10.1109/TITS.2022.3150151
    [36]
    FAN M F, WU Y X, LIAO T J, et al. Deep reinforcement learning for UAV routing in the presence of multiple charging stations[J]. IEEE Transactions on Vehicular Technology, 2023, 72(5): 5732-5746. doi: 10.1109/TVT.2022.3232607
    [37]
    ZHANG K, HE F, ZHANG Z C, et al. Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2020, 121: 102861. doi: 10.1016/j.trc.2020.102861
    [38]
    VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[J]. Advances in Neural Information Processing Systems, 2015, 28: 2692-2700.
    [39]
    CALINSKI T, HARABASZ J. A dendrite method for cluster analysis[J]. Communications in Statistics-Simulation and Computation, 1974, 3(1): 791519860.

Catalog

    Article Metrics

    Article views (566) PDF downloads(88) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return