Volume 26 Issue 3
Mar.  2026
Turn off MathJax
Article Contents
CHEN De-qi, ZHANG Zi-she, ZHANG Wen-hui, WANG Xian-bin. Collaborative control method for multi-UAV search trajectory in high-rise building emergency rescue[J]. Journal of Traffic and Transportation Engineering, 2026, 26(3): 303-316. doi: 10.19818/j.cnki.1671-1637.2026.159
Citation: CHEN De-qi, ZHANG Zi-she, ZHANG Wen-hui, WANG Xian-bin. Collaborative control method for multi-UAV search trajectory in high-rise building emergency rescue[J]. Journal of Traffic and Transportation Engineering, 2026, 26(3): 303-316. doi: 10.19818/j.cnki.1671-1637.2026.159

Collaborative control method for multi-UAV search trajectory in high-rise building emergency rescue

doi: 10.19818/j.cnki.1671-1637.2026.159
Funds:

Heilongjiang Province Philosophy and Social Science Research Planning Project 23GLC022

National Natural Science Foundation of China 52572369

More Information
  • Corresponding author: ZHANG Wen-hui, professor, PhD, E-mail: zhangwenhui@nefu.edu.cn
  • Received Date: 2025-10-10
  • Accepted Date: 2026-01-23
  • Rev Recd Date: 2026-01-03
  • Publish Date: 2026-03-28
  • To address the issues of low learning efficiency and poor strategy robustness in multi-UAV systems during their collaborative emergency search task for high-rise buildings, caused by insufficient experience in close-range inter-agent collision and formation reconfiguration, a multi-agent deep deterministic policy gradient model integrated with prioritized experience replay (PER-MADDPG) was proposed. A UAV swarm simulation environment with a six-DOF dynamic model was established. The multi-UAV collaborative search task was formulated as a multi-agent Markov decision process, and a hierarchical reward function integrating individual trajectory tracking, energy constraints, team formation keeping, and collision avoidance requirements was designed. A centralized critic network was used to calculate the temporal difference errors of the team's joint actions. The joint experiences were quantified, and the prioritized sampling was implemented. The algorithm was guided to focus on high-value sparse collaborative samples. As a result, the convergence to robust collaborative policies was accelerated. Experimental results show that a 98% task success rate is achieved by the PER-MADDPG algorithm, 15.3% higher than the baseline MADDPG algorithm, and the inter-agent collision rate is reduced from 8% to 1%. In terms of collaboration and control accuracy, the average formation error is decreased from 0.07 m to 0.03 m, and the average trajectory tracking error is lowered from 0.12 m to 0.05 m. In the scalability tests on four-UAV and six-UAV formations, the performance degradation caused by physical space congestion is effectively overcome, demonstrating superior robustness to baseline algorithms. The established PER-MADDPG can effectively balance individual control accuracy and team collaboration stability, enhancing search efficiency in high-rise building emergency rescue.

     

  • loading
  • [1]
    CHEN De-qi, ZHANG Zi-she, ZHANG Wen-hui, et al. Trajectory control method for UAV spiral search oriented to high-rise building emergency rescue[J]. Journal of Transportation Systems Engineering and Information Technology, 2025, 25(6): 87-100.
    [2]
    LI C, CHANG Q, FAN H T. Multi-agent reinforcement learning for integrated manufacturing system-process control[J]. Journal of Manufacturing Systems, 2024, 76: 585-598. doi: 10.1016/j.jmsy.2024.08.021
    [3]
    ZHANG K Q, YANG Z R, BAŞAR T. Multi-agent reinforcement learning: A selective overview of theories and algorithms[M]. Handbook of Reinforcement Learning and Control. Cham: Springer International Publishing, 2021: 321-384.
    [4]
    LI T X, ZHU K, LUONG N C, et al. Applications of multi-agent reinforcement learning in future Internet: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2022, 24(2): 1240-1279.
    [5]
    WU Jing-qiong, CHEN Zi-wei, CEN Ming-rui, et al. A review of drone delivery models and key technologies[J]. Journal of Transport Information and Safety, 2025, 43(3): 112-127.
    [6]
    LIU Jing-ao, JIANG Xiao-ai, WANG Yong-chao, et al. Cooperative multi-target aerial tracking for multicopter swarm in cluttered environment[J/OL]. Journal of Beijing University of Aeronautics and Astronautics, 2025, https://doi.org/10.13700/j.bh.1001-5965.2025.0621.
    [7]
    KANG Y, DI J, LI M, et al. Autonomous multi-drone racing method based on deep reinforcement learning[J]. Science China Information Sciences, 2024, 67(8): 180203. doi: 10.1007/s11432-023-4029-9
    [8]
    GUAN Y, ZOU S, PENG H X, et al. Cooperative UAV trajectory design for disaster area emergency communications: A multiagent PPO method[J]. IEEE Internet of Things Journal, 2024, 11(5): 8848-8859. doi: 10.1109/JIOT.2023.3320796
    [9]
    YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL]. 2021, arXiv: 2103.01955.
    [10]
    GAI Si-bo, MA Bei, ZHENG Li, et al. A survey of communication technology for MARL-based robot swarm[J]. Acta Armamentarii, 2026, 47(1): 3-19.
    [11]
    CHEN Guan-liang, LIU Yi, YU Yi. Heterogeneous multi-agent reinforcement learning enabled co-optimization of UAV 3D obstacle avoidance and edge computing[J/OL]. Journal of Computer Applications, 2025, https://link.cnki.net/urlid/51.1307.TP.20251216.1320.002.
    [12]
    AN T X, LEE J, BJELONIC M, et al. Scalable multi-robot cooperation for multi-goal tasks using reinforcement learning[J]. IEEE Robotics and Automation Letters, 2025, 10(2): 1585-1592. doi: 10.1109/LRA.2024.3521183
    [13]
    LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. 2017, arXiv: 1706.02275.
    [14]
    ZHAO G D, WANG Y, MU T, et al. Reinforcement- learning-assisted multi-UAV task allocation and path planning for ⅡoT[J]. IEEE Internet of Things Journal, 2024, 11(16): 26766-26777. doi: 10.1109/JIOT.2024.3370152
    [15]
    WU J H, LI D Y, YU Y Z, et al. An attention mechanism and adaptive accuracy triple-dependent MADDPG formation control method for hybrid UAVs[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(9): 11648-11663. doi: 10.1109/TITS.2024.3379508
    [16]
    HOOK J, DE SILVA V, KONDOZ A. Deep multi-critic network for accelerating policy learning in multi-agent environments[J]. Neural Networks, 2020, 128: 97-106. doi: 10.1016/j.neunet.2020.04.023
    [17]
    WU Xu, ZHAO Zhong-yuan. Multi-UAV cooperative target search in battlefield environments based on NRO-QMIX[J]. Journal of Shandong University of Technology (Natural Science Edition), 2025, 39(6): 32-40, 49.
    [18]
    RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. 2020, arXiv: 2003.08839.
    [19]
    DING R J, CHEN J W, WU W, et al. Packet routing in dynamic multi-hop UAV relay network: A multi-agent learning approach[J]. IEEE Transactions on Vehicular Technology, 2022, 71(9): 10059-10072. doi: 10.1109/TVT.2022.3182335
    [20]
    WEI Lin, YANG Ji-rui, LI Xiu-yi, et al. Review on pilot-in-the-loop modeling techniques facing integrated operation[J]. Journal of Traffic and Transportation Engineering, 2024, 24(4): 208-227. doi: 10.19818/j.cnki.1671-1637.2024.04.016
    [21]
    RASHID T, FARQUHAR G, PENG B, et al. Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. 2020, arXiv: 2006.10800.
    [22]
    KANG H Y, CHANG X L, MIŠI AC'G J, et al. Cooperative UAV resource allocation and task offloading in hierarchical aerial computing systems: A MAPPO-based approach[J]. IEEE Internet of Things Journal, 2023, 10(12): 10497-10509. doi: 10.1109/JIOT.2023.3240173
    [23]
    GUAN Wei, HU Tong-bo, ZHANG Xian-ku, et al. Cooperative navigation method for UAV/USV based on the improved PPO algorithm[J/OL]. Systems Engineering and Electronics, 2025, https://link.cnki.net/urlid/11.2422.TN.20251106.1955.046.
    [24]
    TIAN J J, JIA H F, WANG G F, et al. Optimal scheduling of shared autonomous electric vehicles with multi-agent reinforcement learning: A MAPPO-based approach[J]. Neurocomputing, 2025, 622: 129343. doi: 10.1016/j.neucom.2025.129343
    [25]
    DAI S H, LI S K, TANG H C, et al. MARP: A cooperative multiagent DRL system for connected autonomous vehicle platooning[J]. IEEE Internet of Things Journal, 2024, 11(20): 32454-32463. doi: 10.1109/JIOT.2024.3432119
    [26]
    WU X, YAN Q Z, WANG J C, et al. Dynamic task allocation for UAV swarms in maritime rescue scenarios based on PG-MAPPO[J]. IEEE Internet of Things Journal, 2025, 12(18): 38073-38087. doi: 10.1109/JIOT.2025.3584767
    [27]
    ZHONG R J, ZHANG D H, SCHÄFER L, et al. Robust on-policy sampling for data-efficient policy evaluation in reinforcement learning[EB/OL]. 2021, arXiv: 2111.14552.
    [28]
    QIAN Chen-xu, ZHANG Xue-bo, LI Lun, et al. Research on LLM-RL hierarchical decision-making algorithm for intelligent aerial combat[J]. Control and Decision, 2026, 41(3): 855-864.
    [29]
    HU G Z, ZHU Y H, ZHAO D B, et al. Event-triggered communication network with limited-bandwidth constraint for multi-agent reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 3966-3978. doi: 10.1109/TNNLS.2021.3121546
    [30]
    XIE Hai-wen. Multi-agent distributed cooperative task allocation strategy design[D]. Beijing: North China University of Technology, 2025.
    [31]
    ORR J, DUTTA A. Multi-agent deep reinforcement learning for multi-robot applications: A survey[J]. Sensors, 2023, 23(7): 3625. doi: 10.3390/s23073625

Catalog

    Article Metrics

    Article views (136) PDF downloads(18) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return