面向高层建筑应急救援的多无人机搜索轨迹协同控制方法

陈德启; 张自设; 张文会; 王宪彬

doi:10.19818/j.cnki.1671-1637.2026.159

面向高层建筑应急救援的多无人机搜索轨迹协同控制方法

doi: 10.19818/j.cnki.1671-1637.2026.159

东北林业大学土木与交通学院，黑龙江哈尔滨 150040

基金项目:

黑龙江省哲学社会科学研究规划项目 23GLC022

国家自然科学基金项目 52572369

详细信息

作者简介:
陈德启(1990-)，男，黑龙江哈尔滨人，讲师，工学博士，E-mail：chendeqi@nefu.edu.cn

通讯作者:
张文会(1978-)，男，黑龙江哈尔滨人，教授，工学博士，博士后，E-mail：zhangwenhui@nefu.edu.cn

中图分类号: U8
计量
- 文章访问数: 20
- HTML全文浏览量: 10
- PDF下载量: 4
- 被引次数: 0
出版历程
- 收稿日期: 2025-10-10
- 录用日期: 2026-01-23
- 修回日期: 2026-01-03
- 刊出日期: 2026-03-28

Collaborative control method for multi-UAV search trajectory in high-rise building emergency rescue

School of Civil Engineering and Transportation, Northeast Forestry University, Harbin 150040, Heilongjiang, China

Funds:

Heilongjiang Province Philosophy and Social Science Research Planning Project 23GLC022

National Natural Science Foundation of China 52572369

More Information

Corresponding author: ZHANG Wen-hui, professor, PhD, E-mail: zhangwenhui@nefu.edu.cn

Article Text (Baidu Translation)

摘要

摘要: 为解决多无人机协同执行高层建筑应急搜寻任务时，因智能体间近距离碰撞、队形重构等关键协同经验匮乏，致使学习效率低下、策略鲁棒性不足，提出了一种融合优先经验回放的多智能体深度确定性策略梯度模型(PER-MADDPG)；构建了集成六自由度动力学模型的无人机集群仿真环境，将多机协同搜寻任务抽象为多智能体马尔可夫决策过程，并设计了融合个体轨迹跟踪、能耗约束以及团队队形保持、避碰需求的多层次奖励函数；通过中心化评论家网络计算团队联合动作的时序差分误差，对联合经验进行量化并实施优先采样，引导算法聚焦于高价值的稀疏协同样本，从而加速了鲁棒协同策略的收敛。研究结果表明：PER-MADDPG算法任务成功率达98%，较基准MADDPG算法提升15.3%；智能体间碰撞率由8%降低至1%；在协同与控制精度方面，平均队形误差由0.07 m降至0.03 m，平均轨迹跟踪误差由0.12 m降至0.05 m；在四机及六机编队的扩展性测试中能有效克服物理空间拥挤导致的性能衰减，展现出优于基准算法的鲁棒性。建立的PER-MADDPG能够有效平衡个体控制精度与团队协同稳定性，提升高层建筑应急救援的搜寻效率。
- 低空交通 /
- 多智能体 /
- 航空运输 /
- 高层建筑应急救援 /
- 无人机集群 /
- 轨迹协同控制
Abstract: To address the issues of low learning efficiency and poor strategy robustness in multi-UAV systems during their collaborative emergency search task for high-rise buildings, caused by insufficient experience in close-range inter-agent collision and formation reconfiguration, a multi-agent deep deterministic policy gradient model integrated with prioritized experience replay (PER-MADDPG) was proposed. A UAV swarm simulation environment with a six-DOF dynamic model was established. The multi-UAV collaborative search task was formulated as a multi-agent Markov decision process, and a hierarchical reward function integrating individual trajectory tracking, energy constraints, team formation keeping, and collision avoidance requirements was designed. A centralized critic network was used to calculate the temporal difference errors of the team's joint actions. The joint experiences were quantified, and the prioritized sampling was implemented. The algorithm was guided to focus on high-value sparse collaborative samples. As a result, the convergence to robust collaborative policies was accelerated. Experimental results show that a 98% task success rate is achieved by the PER-MADDPG algorithm, 15.3% higher than the baseline MADDPG algorithm, and the inter-agent collision rate is reduced from 8% to 1%. In terms of collaboration and control accuracy, the average formation error is decreased from 0.07 m to 0.03 m, and the average trajectory tracking error is lowered from 0.12 m to 0.05 m. In the scalability tests on four-UAV and six-UAV formations, the performance degradation caused by physical space congestion is effectively overcome, demonstrating superior robustness to baseline algorithms. The established PER-MADDPG can effectively balance individual control accuracy and team collaboration stability, enhancing search efficiency in high-rise building emergency rescue.
- aviation transportation /
- low-altitude traffic /
- multi-agent /
- high-rise building emergency rescue /
- UAV swarm /
- trajectory collaborative control

HTML全文

图 1 PER-MADDPG算法框架

Figure 1. PER-MADDPG algorithm framework

下载: 全尺寸图片幻灯片

图 2 优先级指数的敏感性分析

Figure 2. Sensitivity analysis of priority indices

下载: 全尺寸图片幻灯片

图 3 重要性采样指数的敏感性分析

Figure 3. Sensitivity analysis of importance sampling indices

下载: 全尺寸图片幻灯片

图 4 各算法的平均回合奖励学习曲线

Figure 4. Average episode reward learning curves for each algorithm

下载: 全尺寸图片幻灯片

图 5 各算法的策略熵变化曲线

Figure 5. Policy entropy variation curves for each algorithm

下载: 全尺寸图片幻灯片

图 6 不同算法的三维协同轨迹对比

Figure 6. Comparison of three-dimensional collaborative trajectories among different algorithms

下载: 全尺寸图片幻灯片

图 7 各算法的轨迹跟踪误差时序对比

Figure 7. Comparison of times series of trajectory tracking errors for each algorithm

下载: 全尺寸图片幻灯片

图 8 各算法的队形误差时序对比

Figure 8. Comparison of time series of formation errors for each algorithm

下载: 全尺寸图片幻灯片

图 9 各算法核心性能指标的统计分布箱形图

Figure 9. Box plot of statistical distributions for core performance metrics of each algorithm

下载: 全尺寸图片幻灯片

图 10 不同编队规模下的任务成功率与系统搜寻效率对比

Figure 10. Comparison of task success rate and system search efficiency under different formation sizes

下载: 全尺寸图片幻灯片

图 11 四机编队协同螺旋扫描三维轨迹对比

Figure 11. Comparison of three-dimensional trajectories for collaborative spiral scanning of four-UAV formation

下载: 全尺寸图片幻灯片

图 12 四机编队轨迹跟踪侧视图对比

Figure 12. Side-view comparison of trajectory tracking for four-UAV formation

下载: 全尺寸图片幻灯片

图 13 不同编队规模下的轨迹跟踪误差统计分布

Figure 13. Statistical distribution of trajectory tracking errors under different formation sizes

下载: 全尺寸图片幻灯片

表 1 算法关键超参数设置

Table 1. Key hyperparameter settings for algorithm

超参数	描述	PER-MADDPG	MADDPG	MAPPO	IPPO
学习率	Actor/Critic 网络学习率	3.0×10^-5
折扣因子	未来奖励的折扣系数	0.99
经验池大小	存储历史经验的最大数量	1.0×10⁶
κ	采样优先级的指数	0.6
β	重要性采样的指数	0.4
软更新率	目标网络更新速率	0.005	0.005

下载: 导出CSV

表 2 各算法核心性能指标对比

Table 2. Comparison of core performance metrics for each algorithm

性能指标	IPPO	MAPPO	MADDPG	PER-MADDPG
任务成功率/%	35.0 ±4.8	70.0 ±4.6	85.0 ±3.6	98.0 ±1.4
智能体间碰撞率/%	25.0 ±4.3	12.0 ±3.2	8.0 ±2.7	1.0 ±1.0
任务完成时间/s	38.0 ±4.0	34.0 ±2.5	32.0 ±1.5	30.5 ±0.8
轨迹跟踪误差/m	0.28 ±0.10	0.18 ±0.06	0.12 ±0.04	0.05 ±0.02
队形误差/m	0.18 ±0.08	0.10 ±0.05	0.07 ±0.04	0.03 ±0.015
控制指令代价	31 500 ±2 500	26 800 ±1 500	24 500 ±900	22 800 ±650

下载: 导出CSV

参考文献(31)

[1]	陈德启, 张自设, 张文会, 等. 面向高层建筑应急救援的无人机螺旋搜索轨迹控制方法[J]. 交通运输系统工程与信息, 2025, 25(6): 87-100. CHEN De-qi, ZHANG Zi-she, ZHANG Wen-hui, et al. Trajectory control method for UAV spiral search oriented to high-rise building emergency rescue[J]. Journal of Transportation Systems Engineering and Information Technology, 2025, 25(6): 87-100.
[2]	LI C, CHANG Q, FAN H T. Multi-agent reinforcement learning for integrated manufacturing system-process control[J]. Journal of Manufacturing Systems, 2024, 76: 585-598. doi: 10.1016/j.jmsy.2024.08.021
[3]	ZHANG K Q, YANG Z R, BAŞAR T. Multi-agent reinforcement learning: A selective overview of theories and algorithms[M]. Handbook of Reinforcement Learning and Control. Cham: Springer International Publishing, 2021: 321-384.
[4]	LI T X, ZHU K, LUONG N C, et al. Applications of multi-agent reinforcement learning in future Internet: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2022, 24(2): 1240-1279.
[5]	伍景琼, 陈子伟, 岑明睿, 等. 无人机配送模式及关键技术研究综述[J]. 交通信息与安全, 2025, 43(3): 112-127. WU Jing-qiong, CHEN Zi-wei, CEN Ming-rui, et al. A review of drone delivery models and key technologies[J]. Journal of Transport Information and Safety, 2025, 43(3): 112-127.
[6]	刘京奥, 姜晓爱, 王永超, 等. 复杂环境下旋翼无人机集群协同多目标跟踪[J/OL]. 北京航空航天大学学报, 2025, https://doi.org/10.13700/j.bh.1001-5965.2025.0621. LIU Jing-ao, JIANG Xiao-ai, WANG Yong-chao, et al. Cooperative multi-target aerial tracking for multicopter swarm in cluttered environment[J/OL]. Journal of Beijing University of Aeronautics and Astronautics, 2025, https://doi.org/10.13700/j.bh.1001-5965.2025.0621.
[7]	KANG Y, DI J, LI M, et al. Autonomous multi-drone racing method based on deep reinforcement learning[J]. Science China Information Sciences, 2024, 67(8): 180203. doi: 10.1007/s11432-023-4029-9
[8]	GUAN Y, ZOU S, PENG H X, et al. Cooperative UAV trajectory design for disaster area emergency communications: A multiagent PPO method[J]. IEEE Internet of Things Journal, 2024, 11(5): 8848-8859. doi: 10.1109/JIOT.2023.3320796
[9]	YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL]. 2021, arXiv: 2103.01955.
[10]	盖思博, 马蓓, 郑莉, 等. 多智能体强化学习框架下机器人集群通信技术综述[J]. 兵工学报, 2026, 47(1): 3-19. GAI Si-bo, MA Bei, ZHENG Li, et al. A survey of communication technology for MARL-based robot swarm[J]. Acta Armamentarii, 2026, 47(1): 3-19.
[11]	陈冠良, 刘义, 余意. 异构多智能体强化学习驱动的无人机三维避障与边缘计算协同优化[J/OL]. 计算机应用, 2025, https://link.cnki.net/urlid/51.1307.TP.20251216.1320.002. CHEN Guan-liang, LIU Yi, YU Yi. Heterogeneous multi-agent reinforcement learning enabled co-optimization of UAV 3D obstacle avoidance and edge computing[J/OL]. Journal of Computer Applications, 2025, https://link.cnki.net/urlid/51.1307.TP.20251216.1320.002.
[12]	AN T X, LEE J, BJELONIC M, et al. Scalable multi-robot cooperation for multi-goal tasks using reinforcement learning[J]. IEEE Robotics and Automation Letters, 2025, 10(2): 1585-1592. doi: 10.1109/LRA.2024.3521183
[13]	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. 2017, arXiv: 1706.02275.
[14]	ZHAO G D, WANG Y, MU T, et al. Reinforcement- learning-assisted multi-UAV task allocation and path planning for ⅡoT[J]. IEEE Internet of Things Journal, 2024, 11(16): 26766-26777. doi: 10.1109/JIOT.2024.3370152
[15]	WU J H, LI D Y, YU Y Z, et al. An attention mechanism and adaptive accuracy triple-dependent MADDPG formation control method for hybrid UAVs[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(9): 11648-11663. doi: 10.1109/TITS.2024.3379508
[16]	HOOK J, DE SILVA V, KONDOZ A. Deep multi-critic network for accelerating policy learning in multi-agent environments[J]. Neural Networks, 2020, 128: 97-106. doi: 10.1016/j.neunet.2020.04.023
[17]	吴旭, 赵中原. 基于NRO-QMIX的战场环境多无人机协同目标搜索[J]. 山东理工大学学报(自然科学版), 2025, 39(6): 32-40, 49. WU Xu, ZHAO Zhong-yuan. Multi-UAV cooperative target search in battlefield environments based on NRO-QMIX[J]. Journal of Shandong University of Technology (Natural Science Edition), 2025, 39(6): 32-40, 49.
[18]	RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. 2020, arXiv: 2003.08839.
[19]	DING R J, CHEN J W, WU W, et al. Packet routing in dynamic multi-hop UAV relay network: A multi-agent learning approach[J]. IEEE Transactions on Vehicular Technology, 2022, 71(9): 10059-10072. doi: 10.1109/TVT.2022.3182335
[20]	魏麟, 杨济睿, 李秀易, 等. 面向融合运行的飞行员在环建模技术综述[J]. 交通运输工程学报, 2024, 24(4): 208-227. doi: 10.19818/j.cnki.1671-1637.2024.04.016 WEI Lin, YANG Ji-rui, LI Xiu-yi, et al. Review on pilot-in-the-loop modeling techniques facing integrated operation[J]. Journal of Traffic and Transportation Engineering, 2024, 24(4): 208-227. doi: 10.19818/j.cnki.1671-1637.2024.04.016
[21]	RASHID T, FARQUHAR G, PENG B, et al. Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. 2020, arXiv: 2006.10800.
[22]	KANG H Y, CHANG X L, MIŠI AC'G J, et al. Cooperative UAV resource allocation and task offloading in hierarchical aerial computing systems: A MAPPO-based approach[J]. IEEE Internet of Things Journal, 2023, 10(12): 10497-10509. doi: 10.1109/JIOT.2023.3240173
[23]	关巍, 胡彤博, 张显库, 等. 基于改进PPO算法的无人机/无人船协同导航方法[J/OL]. 系统工程与电子技术, 2025, https://link.cnki.net/urlid/11.2422.TN.20251106.1955.046. GUAN Wei, HU Tong-bo, ZHANG Xian-ku, et al. Cooperative navigation method for UAV/USV based on the improved PPO algorithm[J/OL]. Systems Engineering and Electronics, 2025, https://link.cnki.net/urlid/11.2422.TN.20251106.1955.046.
[24]	TIAN J J, JIA H F, WANG G F, et al. Optimal scheduling of shared autonomous electric vehicles with multi-agent reinforcement learning: A MAPPO-based approach[J]. Neurocomputing, 2025, 622: 129343. doi: 10.1016/j.neucom.2025.129343
[25]	DAI S H, LI S K, TANG H C, et al. MARP: A cooperative multiagent DRL system for connected autonomous vehicle platooning[J]. IEEE Internet of Things Journal, 2024, 11(20): 32454-32463. doi: 10.1109/JIOT.2024.3432119
[26]	WU X, YAN Q Z, WANG J C, et al. Dynamic task allocation for UAV swarms in maritime rescue scenarios based on PG-MAPPO[J]. IEEE Internet of Things Journal, 2025, 12(18): 38073-38087. doi: 10.1109/JIOT.2025.3584767
[27]	ZHONG R J, ZHANG D H, SCHÄFER L, et al. Robust on-policy sampling for data-efficient policy evaluation in reinforcement learning[EB/OL]. 2021, arXiv: 2111.14552.
[28]	骞晨旭, 张雪波, 李论, 等. 面向智能空中博弈的大语言模型-强化学习分层决策算法[J]. 控制与决策, 2026, 41(3): 855-864. QIAN Chen-xu, ZHANG Xue-bo, LI Lun, et al. Research on LLM-RL hierarchical decision-making algorithm for intelligent aerial combat[J]. Control and Decision, 2026, 41(3): 855-864.
[29]	HU G Z, ZHU Y H, ZHAO D B, et al. Event-triggered communication network with limited-bandwidth constraint for multi-agent reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 3966-3978. doi: 10.1109/TNNLS.2021.3121546
[30]	谢海文. 多智能体分布式协同任务分配策略设计[D]. 北京: 北方工业大学, 2025. XIE Hai-wen. Multi-agent distributed cooperative task allocation strategy design[D]. Beijing: North China University of Technology, 2025.
[31]	ORR J, DUTTA A. Multi-agent deep reinforcement learning for multi-robot applications: A survey[J]. Sensors, 2023, 23(7): 3625. doi: 10.3390/s23073625