留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向高层建筑应急救援的多无人机搜索轨迹协同控制方法

陈德启 张自设 张文会 王宪彬

陈德启, 张自设, 张文会, 王宪彬. 面向高层建筑应急救援的多无人机搜索轨迹协同控制方法[J]. 交通运输工程学报, 2026, 26(3): 303-316. doi: 10.19818/j.cnki.1671-1637.2026.159
引用本文: 陈德启, 张自设, 张文会, 王宪彬. 面向高层建筑应急救援的多无人机搜索轨迹协同控制方法[J]. 交通运输工程学报, 2026, 26(3): 303-316. doi: 10.19818/j.cnki.1671-1637.2026.159
CHEN De-qi, ZHANG Zi-she, ZHANG Wen-hui, WANG Xian-bin. Collaborative control method for multi-UAV search trajectory in high-rise building emergency rescue[J]. Journal of Traffic and Transportation Engineering, 2026, 26(3): 303-316. doi: 10.19818/j.cnki.1671-1637.2026.159
Citation: CHEN De-qi, ZHANG Zi-she, ZHANG Wen-hui, WANG Xian-bin. Collaborative control method for multi-UAV search trajectory in high-rise building emergency rescue[J]. Journal of Traffic and Transportation Engineering, 2026, 26(3): 303-316. doi: 10.19818/j.cnki.1671-1637.2026.159

面向高层建筑应急救援的多无人机搜索轨迹协同控制方法

doi: 10.19818/j.cnki.1671-1637.2026.159
基金项目: 

黑龙江省哲学社会科学研究规划项目 23GLC022

国家自然科学基金项目 52572369

详细信息
    作者简介:

    陈德启(1990-),男,黑龙江哈尔滨人,讲师,工学博士,E-mail:chendeqi@nefu.edu.cn

    通讯作者:

    张文会(1978-),男,黑龙江哈尔滨人,教授,工学博士,博士后,E-mail:zhangwenhui@nefu.edu.cn

  • 中图分类号: U8

Collaborative control method for multi-UAV search trajectory in high-rise building emergency rescue

Funds: 

Heilongjiang Province Philosophy and Social Science Research Planning Project 23GLC022

National Natural Science Foundation of China 52572369

More Information
Article Text (Baidu Translation)
  • 摘要: 为解决多无人机协同执行高层建筑应急搜寻任务时,因智能体间近距离碰撞、队形重构等关键协同经验匮乏,致使学习效率低下、策略鲁棒性不足,提出了一种融合优先经验回放的多智能体深度确定性策略梯度模型(PER-MADDPG);构建了集成六自由度动力学模型的无人机集群仿真环境,将多机协同搜寻任务抽象为多智能体马尔可夫决策过程,并设计了融合个体轨迹跟踪、能耗约束以及团队队形保持、避碰需求的多层次奖励函数;通过中心化评论家网络计算团队联合动作的时序差分误差,对联合经验进行量化并实施优先采样,引导算法聚焦于高价值的稀疏协同样本,从而加速了鲁棒协同策略的收敛。研究结果表明:PER-MADDPG算法任务成功率达98%,较基准MADDPG算法提升15.3%;智能体间碰撞率由8%降低至1%;在协同与控制精度方面,平均队形误差由0.07 m降至0.03 m,平均轨迹跟踪误差由0.12 m降至0.05 m;在四机及六机编队的扩展性测试中能有效克服物理空间拥挤导致的性能衰减,展现出优于基准算法的鲁棒性。建立的PER-MADDPG能够有效平衡个体控制精度与团队协同稳定性,提升高层建筑应急救援的搜寻效率。

     

  • 图  1  PER-MADDPG算法框架

    Figure  1.  PER-MADDPG algorithm framework

    图  2  优先级指数的敏感性分析

    Figure  2.  Sensitivity analysis of priority indices

    图  3  重要性采样指数的敏感性分析

    Figure  3.  Sensitivity analysis of importance sampling indices

    图  4  各算法的平均回合奖励学习曲线

    Figure  4.  Average episode reward learning curves for each algorithm

    图  5  各算法的策略熵变化曲线

    Figure  5.  Policy entropy variation curves for each algorithm

    图  6  不同算法的三维协同轨迹对比

    Figure  6.  Comparison of three-dimensional collaborative trajectories among different algorithms

    图  7  各算法的轨迹跟踪误差时序对比

    Figure  7.  Comparison of times series of trajectory tracking errors for each algorithm

    图  8  各算法的队形误差时序对比

    Figure  8.  Comparison of time series of formation errors for each algorithm

    图  9  各算法核心性能指标的统计分布箱形图

    Figure  9.  Box plot of statistical distributions for core performance metrics of each algorithm

    图  10  不同编队规模下的任务成功率与系统搜寻效率对比

    Figure  10.  Comparison of task success rate and system search efficiency under different formation sizes

    图  11  四机编队协同螺旋扫描三维轨迹对比

    Figure  11.  Comparison of three-dimensional trajectories for collaborative spiral scanning of four-UAV formation

    图  12  四机编队轨迹跟踪侧视图对比

    Figure  12.  Side-view comparison of trajectory tracking for four-UAV formation

    图  13  不同编队规模下的轨迹跟踪误差统计分布

    Figure  13.  Statistical distribution of trajectory tracking errors under different formation sizes

    表  1  算法关键超参数设置

    Table  1.   Key hyperparameter settings for algorithm

    超参数 描述 PER-MADDPG MADDPG MAPPO IPPO
    学习率 Actor/Critic
    网络学习率
    3.0×10-5
    折扣因子 未来奖励的折扣系数 0.99
    经验池大小 存储历史经验
    的最大数量
    1.0×106
    κ 采样优先级的指数 0.6
    β 重要性采样的指数 0.4
    软更新率 目标网络更新速率 0.005 0.005
    下载: 导出CSV

    表  2  各算法核心性能指标对比

    Table  2.   Comparison of core performance metrics for each algorithm

    性能指标 IPPO MAPPO MADDPG PER-MADDPG
    任务成功率/% 35.0 ±4.8 70.0 ±4.6 85.0 ±3.6 98.0 ±1.4
    智能体间碰撞率/% 25.0 ±4.3 12.0 ±3.2 8.0 ±2.7 1.0 ±1.0
    任务完成时间/s 38.0 ±4.0 34.0 ±2.5 32.0 ±1.5 30.5 ±0.8
    轨迹跟踪误差/m 0.28 ±0.10 0.18 ±0.06 0.12 ±0.04 0.05 ±0.02
    队形误差/m 0.18 ±0.08 0.10 ±0.05 0.07 ±0.04 0.03 ±0.015
    控制指令代价 31 500 ±2 500 26 800 ±1 500 24 500 ±900 22 800 ±650
    下载: 导出CSV
  • [1] 陈德启, 张自设, 张文会, 等. 面向高层建筑应急救援的无人机螺旋搜索轨迹控制方法[J]. 交通运输系统工程与信息, 2025, 25(6): 87-100.

    CHEN De-qi, ZHANG Zi-she, ZHANG Wen-hui, et al. Trajectory control method for UAV spiral search oriented to high-rise building emergency rescue[J]. Journal of Transportation Systems Engineering and Information Technology, 2025, 25(6): 87-100.
    [2] LI C, CHANG Q, FAN H T. Multi-agent reinforcement learning for integrated manufacturing system-process control[J]. Journal of Manufacturing Systems, 2024, 76: 585-598. doi: 10.1016/j.jmsy.2024.08.021
    [3] ZHANG K Q, YANG Z R, BAŞAR T. Multi-agent reinforcement learning: A selective overview of theories and algorithms[M]. Handbook of Reinforcement Learning and Control. Cham: Springer International Publishing, 2021: 321-384.
    [4] LI T X, ZHU K, LUONG N C, et al. Applications of multi-agent reinforcement learning in future Internet: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2022, 24(2): 1240-1279.
    [5] 伍景琼, 陈子伟, 岑明睿, 等. 无人机配送模式及关键技术研究综述[J]. 交通信息与安全, 2025, 43(3): 112-127.

    WU Jing-qiong, CHEN Zi-wei, CEN Ming-rui, et al. A review of drone delivery models and key technologies[J]. Journal of Transport Information and Safety, 2025, 43(3): 112-127.
    [6] 刘京奥, 姜晓爱, 王永超, 等. 复杂环境下旋翼无人机集群协同多目标跟踪[J/OL]. 北京航空航天大学学报, 2025, https://doi.org/10.13700/j.bh.1001-5965.2025.0621.

    LIU Jing-ao, JIANG Xiao-ai, WANG Yong-chao, et al. Cooperative multi-target aerial tracking for multicopter swarm in cluttered environment[J/OL]. Journal of Beijing University of Aeronautics and Astronautics, 2025, https://doi.org/10.13700/j.bh.1001-5965.2025.0621.
    [7] KANG Y, DI J, LI M, et al. Autonomous multi-drone racing method based on deep reinforcement learning[J]. Science China Information Sciences, 2024, 67(8): 180203. doi: 10.1007/s11432-023-4029-9
    [8] GUAN Y, ZOU S, PENG H X, et al. Cooperative UAV trajectory design for disaster area emergency communications: A multiagent PPO method[J]. IEEE Internet of Things Journal, 2024, 11(5): 8848-8859. doi: 10.1109/JIOT.2023.3320796
    [9] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL]. 2021, arXiv: 2103.01955.
    [10] 盖思博, 马蓓, 郑莉, 等. 多智能体强化学习框架下机器人集群通信技术综述[J]. 兵工学报, 2026, 47(1): 3-19.

    GAI Si-bo, MA Bei, ZHENG Li, et al. A survey of communication technology for MARL-based robot swarm[J]. Acta Armamentarii, 2026, 47(1): 3-19.
    [11] 陈冠良, 刘义, 余意. 异构多智能体强化学习驱动的无人机三维避障与边缘计算协同优化[J/OL]. 计算机应用, 2025, https://link.cnki.net/urlid/51.1307.TP.20251216.1320.002.

    CHEN Guan-liang, LIU Yi, YU Yi. Heterogeneous multi-agent reinforcement learning enabled co-optimization of UAV 3D obstacle avoidance and edge computing[J/OL]. Journal of Computer Applications, 2025, https://link.cnki.net/urlid/51.1307.TP.20251216.1320.002.
    [12] AN T X, LEE J, BJELONIC M, et al. Scalable multi-robot cooperation for multi-goal tasks using reinforcement learning[J]. IEEE Robotics and Automation Letters, 2025, 10(2): 1585-1592. doi: 10.1109/LRA.2024.3521183
    [13] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. 2017, arXiv: 1706.02275.
    [14] ZHAO G D, WANG Y, MU T, et al. Reinforcement- learning-assisted multi-UAV task allocation and path planning for ⅡoT[J]. IEEE Internet of Things Journal, 2024, 11(16): 26766-26777. doi: 10.1109/JIOT.2024.3370152
    [15] WU J H, LI D Y, YU Y Z, et al. An attention mechanism and adaptive accuracy triple-dependent MADDPG formation control method for hybrid UAVs[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(9): 11648-11663. doi: 10.1109/TITS.2024.3379508
    [16] HOOK J, DE SILVA V, KONDOZ A. Deep multi-critic network for accelerating policy learning in multi-agent environments[J]. Neural Networks, 2020, 128: 97-106. doi: 10.1016/j.neunet.2020.04.023
    [17] 吴旭, 赵中原. 基于NRO-QMIX的战场环境多无人机协同目标搜索[J]. 山东理工大学学报(自然科学版), 2025, 39(6): 32-40, 49.

    WU Xu, ZHAO Zhong-yuan. Multi-UAV cooperative target search in battlefield environments based on NRO-QMIX[J]. Journal of Shandong University of Technology (Natural Science Edition), 2025, 39(6): 32-40, 49.
    [18] RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. 2020, arXiv: 2003.08839.
    [19] DING R J, CHEN J W, WU W, et al. Packet routing in dynamic multi-hop UAV relay network: A multi-agent learning approach[J]. IEEE Transactions on Vehicular Technology, 2022, 71(9): 10059-10072. doi: 10.1109/TVT.2022.3182335
    [20] 魏麟, 杨济睿, 李秀易, 等. 面向融合运行的飞行员在环建模技术综述[J]. 交通运输工程学报, 2024, 24(4): 208-227. doi: 10.19818/j.cnki.1671-1637.2024.04.016

    WEI Lin, YANG Ji-rui, LI Xiu-yi, et al. Review on pilot-in-the-loop modeling techniques facing integrated operation[J]. Journal of Traffic and Transportation Engineering, 2024, 24(4): 208-227. doi: 10.19818/j.cnki.1671-1637.2024.04.016
    [21] RASHID T, FARQUHAR G, PENG B, et al. Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. 2020, arXiv: 2006.10800.
    [22] KANG H Y, CHANG X L, MIŠI AC'G J, et al. Cooperative UAV resource allocation and task offloading in hierarchical aerial computing systems: A MAPPO-based approach[J]. IEEE Internet of Things Journal, 2023, 10(12): 10497-10509. doi: 10.1109/JIOT.2023.3240173
    [23] 关巍, 胡彤博, 张显库, 等. 基于改进PPO算法的无人机/无人船协同导航方法[J/OL]. 系统工程与电子技术, 2025, https://link.cnki.net/urlid/11.2422.TN.20251106.1955.046.

    GUAN Wei, HU Tong-bo, ZHANG Xian-ku, et al. Cooperative navigation method for UAV/USV based on the improved PPO algorithm[J/OL]. Systems Engineering and Electronics, 2025, https://link.cnki.net/urlid/11.2422.TN.20251106.1955.046.
    [24] TIAN J J, JIA H F, WANG G F, et al. Optimal scheduling of shared autonomous electric vehicles with multi-agent reinforcement learning: A MAPPO-based approach[J]. Neurocomputing, 2025, 622: 129343. doi: 10.1016/j.neucom.2025.129343
    [25] DAI S H, LI S K, TANG H C, et al. MARP: A cooperative multiagent DRL system for connected autonomous vehicle platooning[J]. IEEE Internet of Things Journal, 2024, 11(20): 32454-32463. doi: 10.1109/JIOT.2024.3432119
    [26] WU X, YAN Q Z, WANG J C, et al. Dynamic task allocation for UAV swarms in maritime rescue scenarios based on PG-MAPPO[J]. IEEE Internet of Things Journal, 2025, 12(18): 38073-38087. doi: 10.1109/JIOT.2025.3584767
    [27] ZHONG R J, ZHANG D H, SCHÄFER L, et al. Robust on-policy sampling for data-efficient policy evaluation in reinforcement learning[EB/OL]. 2021, arXiv: 2111.14552.
    [28] 骞晨旭, 张雪波, 李论, 等. 面向智能空中博弈的大语言模型-强化学习分层决策算法[J]. 控制与决策, 2026, 41(3): 855-864.

    QIAN Chen-xu, ZHANG Xue-bo, LI Lun, et al. Research on LLM-RL hierarchical decision-making algorithm for intelligent aerial combat[J]. Control and Decision, 2026, 41(3): 855-864.
    [29] HU G Z, ZHU Y H, ZHAO D B, et al. Event-triggered communication network with limited-bandwidth constraint for multi-agent reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 3966-3978. doi: 10.1109/TNNLS.2021.3121546
    [30] 谢海文. 多智能体分布式协同任务分配策略设计[D]. 北京: 北方工业大学, 2025.

    XIE Hai-wen. Multi-agent distributed cooperative task allocation strategy design[D]. Beijing: North China University of Technology, 2025.
    [31] ORR J, DUTTA A. Multi-agent deep reinforcement learning for multi-robot applications: A survey[J]. Sensors, 2023, 23(7): 3625. doi: 10.3390/s23073625
  • 加载中
图(13) / 表(2)
计量
  • 文章访问数:  20
  • HTML全文浏览量:  10
  • PDF下载量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-10-10
  • 录用日期:  2026-01-23
  • 修回日期:  2026-01-03
  • 刊出日期:  2026-03-28

目录

    /

    返回文章
    返回