基于多源数据与大模型的无人机巡航风险识别技术

马涛; 吴俊; 唐樊龙; 范剑伟; 王宁

doi:10.19818/j.cnki.1671-1637.2026.036

基于多源数据与大模型的无人机巡航风险识别技术

doi: 10.19818/j.cnki.1671-1637.2026.036

马涛^1,,
吴俊¹,
唐樊龙^2, ,,
范剑伟²,
王宁³

1.
东南大学交通学院，江苏南京 211189
2.
金陵科技学院网络与通信工程学院，江苏南京 211169
3.
南京林业大学土木工程学院，江苏南京 210037

基金项目:

国家重点研发计划 2020YFB1600102

国家自然科学基金项目 52378445

西藏自治区科技计划项目 XZ202501JX0006

金陵科技学院高层次人才科研启动项目 jit-b-202401

详细信息

作者简介:
马涛(1981-)，男，江苏徐州人，教授，博士生导师，工学博士，E-mail: matao@seu.edu.cn

通讯作者:
唐樊龙(1988-)，男，湖北宜昌人，讲师，工学博士, E-mail: tangfanlong@jit.edu.cn

中图分类号: U8
计量
- 文章访问数: 88
- HTML全文浏览量: 74
- PDF下载量: 32
- 被引次数: 0
出版历程
- 收稿日期: 2025-07-03
- 录用日期: 2025-09-26
- 修回日期: 2025-08-19
- 刊出日期: 2026-03-28

Unmanned aerial vehicle cruise risk identification technology based on multi-source data and large models

1.
School of Transportation, Southeast University, Nanjing 211189, Jiangsu, China
2.
School of Network and Communication Engineering, Jinling Institute of Technology, Nanjing 211169, Jiangsu, China
3.
College of Civil Engineering, Nanjing Forestry University, Nanjing 210037, Jiangsu, China

Funds:

National Key R&D Program of China 2020YFB1600102

National Natural Science Foundation of China 52378445

Xizang Autonomous Region Science and Technology Funding XZ202501JX0006

Foundation of Jinling Institute of Technology jit-b-202401

More Information

Corresponding author: TANG Fan-long, lecturer, PhD, E-mail: tangfanlong@jit.edu.cn

Article Text (Baidu Translation)

摘要

摘要: 针对无人机在巡航过程中面临的复杂风险事件识别问题，探究了无人机巡航风险的基本要素，明确了提示词中应包含的风险特征参数；分析了多模态大模型的实现方式、架构和典型模型，提出了提示词生成模型融合多源数据的方案；结合环境感知和检测识别追踪方法建立了集成宏观场景描述、动态场景补充、突发风险检测三大模块的提示词生成模型；将提取到的特征参数集成到提示词中，通过DeepSeek综合分析，完成无人机巡航风险的识别与判断。研究结果表明：三大模块能够快速完成无人机巡航风险的识别，并获得完整的提示词；基于Owl-ViT模型的静态场景描述能有效识别飞行中的静态障碍物，置信度超过80%；基于ByteTrack算法的动态物体抓取，可快速获取飞鸟、其他无人机等动态物体的距离、速度、坐标等动态信息；基于点云的突发风险识别可以捕捉点云障碍物信息，包括目标的距离、尺寸、体积、纵横比等，能够快速检测突发进入安全区域的障碍物；通过提示词生成的DeepSeek输出结果可详细展示巡航过程中的风险内容、等级并给出安全建议；开发的无人机巡航风险识别系统，可将感知识别数据可视化，并明确执行任务的设备和任务信息，进一步辅助DeepSeek进行风险判断。研究结果能够为无人机巡航过程中进行风险识别以及安全高效飞行提供有效技术支撑。
- 低空交通 /
- 无人机 /
- 大模型 /
- 巡航风险识别技术 /
- 提示词生成模型 /
- 多源数据
Abstract: To identify complex risk events during the cruise of unmanned aerial vehicles (UAVs), the basic elements of UAV cruise risks were explored, and characteristic parameters required for prompt were specified. The implementation methods, architectures, and typical models of multimodal large models were analyzed, and a scheme for integrating multi-source data in the prompt generation model was proposed. By combining environmental perception, detection, identification and tracking methods, a prompt generation model integrating with macroscopic scene description, dynamic scene supplementation, and sudden risk detection was established. The extracted feature parameters were then integrated into the prompt. The UAV cruise risk identification and judgment were completed through DeepSeek's comprehensive analysis. Research results show that the three modules can quickly complete the identification of UAV cruise risks and obtain complete prompts. The static scene description based on the Owl-ViT model can effectively identify static obstacles during flight, with confidence exceeding 80%. The dynamic object capture based on the ByteTrack algorithm can quickly obtain dynamic information such as the distance, speed, and coordinates of flying birds and other UAVs. The sudden risk identification based on point clouds can capture point cloud obstacle information, including the distance, size, volume, and aspect ratio of the target, and can quickly detect obstacles that suddenly enter the safe area. The output results of DeepSeek generated by the prompt can detail the risk content and level during the cruise, and provide safety suggestions. The developed UAV cruise risk identification system can visualize the perception and identification data and determine the device and task information for the tasks, further assisting DeepSeek in risk judgment. The research results can provide effective technical support for risk identification during UAV cruise as well as safe and efficient flight.
- low-altitude traffic /
- UAV /
- large model /
- cruise risk identification technology /
- prompt word generation model /
- multi-source data

HTML全文

图 1 巡航风险要素解构

Figure 1. Decomposition of cruise risk factors

下载: 全尺寸图片幻灯片

图 2 提示词生成模型数据融合方案

Figure 2. Data fusion schemes for the prompt word generation model

下载: 全尺寸图片幻灯片

图 3 无人机巡航过程中的静态场景检测结果

Figure 3. Static scene detection results during the UAV's cruising process

下载: 全尺寸图片幻灯片

图 4 无人机巡航过程中的动态障碍物追踪结果

Figure 4. Dynamic obstacle tracking results during the UAV's cruising process

下载: 全尺寸图片幻灯片

图 5 无人机巡航过程中进入安全区域障碍物的检测结果

Figure 5. Detection results of obstacles that entered the safety zone during the UAV's cruising process

下载: 全尺寸图片幻灯片

图 6 实时监控模块

Figure 6. Real-time monitoring module

下载: 全尺寸图片幻灯片

图 7 设备管理模块

Figure 7. Equipment management module

下载: 全尺寸图片幻灯片

图 8 任务执行模块

Figure 8. Task execution module

下载: 全尺寸图片幻灯片

图 9 用户配置模块

Figure 9. User configuration module

下载: 全尺寸图片幻灯片

表 1 无人机巡航风险事件类型

Table 1. Types of risk events during UAV cruise

风险类型	主要内容
碰撞风险	与静止障碍物、其他航空器或地面物体碰撞
环境风险	极端天气(强风、雷暴)、电磁干扰、GPS信号丢失
技术故障风险	电池、电机等动力系统失效，导航/通信中断，传感器故障
人为操作风险	操作指令错误，任务规划不合理，应急响应延迟
通信链路风险	控制信号中断，数据链延迟或被劫持
法规与合规风险	违反空域管制规定，未取得适航许可或超出操作权限
恶意攻击风险	黑客入侵、GPS欺骗、物理劫持或电磁干扰攻击

下载: 导出CSV

表 2 无人机安全运行影响因素

Table 2. Factors affecting the safe operation of UAVs

因素	子因素	具体内容
内部因素	无人机设计可靠性	冗余系统、故障诊断能力
	能源系统的状态	电池寿命、剩余电量
	传感器与算法精度	检测识别目标物、避障、路径规划能力
外部因素	气象条件	风速、能见度、温度
	空域复杂度	障碍物密度、其他飞行器数量
	电磁环境	信号干扰源强度、频谱占用率
	人为因素	操作员经验、应急预案完善性

下载: 导出CSV

表 3 数据融合方案指标对比

Table 3. Comparison of indicators for data fusion schemes

方案	训练数据量要求	算力要求(10⁹ FLOPS)	泛化能力
1	低(采用单模态预训练模型)	0.8	中等(依赖模块设计)
2	高(跨模态对齐数据)	2.5	强(模态间相互学习)
3	极高(海量多模态数据)	5.0+	极强(端到端自适应)

下载: 导出CSV

表 4 数据融合方案适用场景与优势分析

Table 4. Analysis of applicable scenarios and advantages of data fusion solutions

方案	适用场景	主要优势
1	适用于实时性要求高的动态场景	计算高效，易部署，支持增量更新
2	多模态强交互任务	跨模态理解深度优，支持复杂语义推理
3	离线高精度分析	全模态联合优化，小样本泛化能力突出

下载: 导出CSV

表 5 主流目标检测模型的特性

Table 5. Characteristics of mainstream object detection models

特性	Fast R-CNN	YOLO	DETR	GLIP	Owl-ViT
架构基础	CNN	CNN	Transformer	Transformer	Transformer
训练方式	监督学习	监督学习	监督学习	对比学习	对比学习
检测方式	封闭词汇	封闭词汇	封闭词汇	开放词汇	开放词汇
零样本能力	无	无	有限	优秀	优秀
多模态支持	否	否	否	是	是
推理速度	较慢	极快	中等	较慢	中等
主要优势	成熟稳定	实时性	端到端检测	强零样本能力	开放世界适应性
主要局限	固定类别	固定类别	训练复杂	计算资源需求高	小目标检测弱

下载: 导出CSV

表 6 Owl-ViT与GLIP模型不同维度对比

Table 6. Comparison of dimensions between Owl-ViT and GLIP models

对比维度	Owl-ViT	GLIP
设计理念	开放世界的视觉定位	检测与定位的统一框架
模型架构	ViT+文本Transformer	Swin/BERT+深度融合模块
预训练数据	400 M图像-文本对	27 M检测数据+3 M人工标注
任务形式	图像-文本对比学习	区域-短语匹配任务
零样本迁移	通过文本提示实现	通过语言描述生成动态检测头
检测头设计	固定结构	动态生成
典型应用场景	开放词汇检索、定位	细粒度视觉语义理解
推理速度	中等(约15帧·s^-1)	较慢(约8帧·s^-1)
模型规模	基础版约300 M参数	大型版可达1 B参数
小目标检测	相对较弱	表现更好(Swin层次化特征)
语言理解深度	侧重关键词匹配	支持复杂语义关系理解

下载: 导出CSV

表 7 追踪算法核心思想与数据关联策略对比

Table 7. Comparison of core concepts of the tracking algorithms and the data association strategies

算法	核心思想	数据关联策略
SORT	卡尔曼滤波+匈牙利算法，仅用高分检测框	IoU匹配
DeepSORT	结合ReID(外观)特征，减少ID切换	IoU+外观相似度
FairMOT	联合检测+跟踪(JDE)，端到端训练，共享特征提取器	基于中心点的关联
OC-SORT	改进运动模型，引入轨迹补偿机制预测目标位置，对非线性运动更鲁棒	运动一致性+观测平滑
ByteTrack	2次匹配(高分+低分检测框)，减少漏检，卡尔曼滤波预测边界框位置，依赖检测质量而非复杂运动模型	基于IoU和外观相似度(可选)，但对遮挡目标更鲁棒
BoT-SORT	结合ByteTrack的低分策略和DeepSORT的ReID模块，平衡速度与精度	IoU+外观相似度+运动补偿

下载: 导出CSV

表 8 追踪算法适用场景与特点对比

Table 8. Comparison of application scenarios and characteristics of tracking algorithms

算法	适用场景	优点	缺点
SORT	实时性要求高、目标遮挡少的场景(如简单交通监控)	计算量小，速度快，可达60帧·s^-1	丢弃低分框，遮挡时易丢失目标，导致ID切换频繁，依赖检测质量
DeepSORT	对ID一致性要求高且算力充足的场景(如体育赛事分析)	ID稳定性高，长期跟踪效果好	ReID模块计算量大(40帧·s^-1)，丢弃低分检测框
FairMOT	行人跟踪，多摄像头ReID(如商场监控)	端到端，高效，检测与跟踪联合优化	ReID与检测任务冲突，遮挡时性能下降，计算量大
OC-SORT	适合复杂运动轨迹(如无人机视角跟踪、快速变向目标)	适合复杂运动轨迹，减少ID切换	算法复杂度较高，实时性稍差(30帧·s^-1)
ByteTrack	遮挡频繁、需要实时性的场景(如实时监控、交通流量分析)	利用低分检测框进行关联匹配，减少漏检，速度快(50帧·s^-1)，遮挡鲁棒	无ReID模块，相似目标区分能力较弱
BoT-SORT	需高精度+实时性的场景(如自动驾驶、高精度监控)	结合ByteTrack、ReID优势，ID稳定，速度较快	需要预先训练ReID模型，配置较复杂

下载: 导出CSV

表 9 三大检测模块及DeepSeek测算结果

Table 9. Three major detection modules and DeepSeek measurement results

模块	功能实现模型	平均延迟时间	识别准确率
宏观场景描述	Owl-ViT-Large	60~80 ms	75%
动态场景补充	Owl-ViT-Large+ByteTrack	跟踪算法延迟仅增加5~10 ms，延迟主要在检测	依赖检测模型，准确率达75%，ID稳定
突发风险检测	点云处理-聚类分析-多帧关联-障碍物属性计算	25~35 ms	85%
巡航风险判断	DeepSeek	40~50 ms	80%

下载: 导出CSV

表 10 避障全链路延迟分解

Table 10. Obstacle avoidance full-link delay decomposition

环节	子任务	延迟范围/ms	影响因素及优化方向
传感器数据采集	摄像头图像传输激光雷达点云生成	5~20	传感器硬件性能
风险识别	宏观场景检测(Owl-ViT)突发障碍检测(点云)	50~80	模型量化、硬件加速(TensorRT)
决策规划	路径重规划动态障碍预测	10~30	算法复杂度
控制执行	电机响应、姿态调整	5~15	飞控板性能
总延迟		70~145

下载: 导出CSV

参考文献(29)

[1]	胡文晓, 牟迪, 李智, 等. 以关键技术问题创新引领低空经济发展对策研究[J/OL]. 航空学报, 2024, https://doi.org/10.7527/S1000-6893.2024.31539. HU Wen-xiao, MU Di, LI Zhi, et al. Research on counter-measures for promoting the development of low-altitude economy through innovation in key technical issues[J/OL]. Acta Aeronautica et Astronautica Sinica, 2024, https://doi.org/10.7527/S1000-6893.2024.31539.
[2]	贾永楠. 低空空域无人系统交通管理方案初探[J]. 航空学报, 2025, 46(11): 114-140. JIA Yong-nan. A scheme for unmanned aerial system traffic management in low-altitude airspace[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(11): 114-140.
[3]	张学军, 李诚龙, 张志远, 等. 低空航行系统实时风险管理能力构建: 概念、挑战与技术[J]. 航空学报, 2025, 46(11): 8-34. ZHANG Xue-jun, LI Cheng-long, ZHANG Zhi-yuan, et al. Constructing in-time risk management capabilities for low-altitude aviation systems: Concepts, technologies, and challenges[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(11): 8-34.
[4]	林嘉美霖, 黄龙杨. 无人机运行风险评估及风险缓解研究[J]. 舰船电子工程, 2024, 44(11): 156-160. LIN Jia-mei-lin, HUANG Long-yang. Unmanned aerial system (UAS) operation risk assessment and risk mitigation research[J]. Ship Electronic Engineering, 2024, 44(11): 156-160.
[5]	于立深. 低空经济有序发展的政府管制逻辑及法律保障[J]. 江西社会科学, 2025, 45(3): 33-47, 206. YU Li-shen. On government regulation and legal protection in the orderly development of low-alti-tude economy[J]. Jiangxi Social Sciences, 2025, 45(3): 33-47, 206.
[6]	ASWINI N, KUMAR E K, UMA S V. UAV and obstacle sensing techniques—a perspective[J]. International Journal of Intelligent Unmanned Systems, 2018, 6(1): 32-46. doi: 10.1108/IJIUS-11-2017-0013
[7]	SKARKA W, ASHFAQ R. Hybrid machine learning and reinforcement learning framework for adaptive UAV obstacle avoidance[J]. Aerospace, 2024, 11(11): 870. doi: 10.3390/aerospace11110870
[8]	王家亮, 董楷, 顾兆军, 等. 小型无人机视觉传感器避障方法综述[J]. 西安电子科技大学学报(自然科学版), 2025, 52(1): 60-79. WANG Jia-liang, DONG Kai, GU Zhao-jun, et al. Review of obstacle avoidance methods for small UAVs using visual sensors[J]. Journal of Xidian University (Natural Science), 2025, 52(1): 60-79.
[9]	SUN S, HAN Y Q, LI Y, et al. Analysis and research of intelligent distribution UAV control system based on optical flow sensor[C]//Springer. 2023 International Conference on Innovative Computing. Munich: Springer, 2023: 128-137.
[10]	MAO Y M, CHEN M, WEI X L, et al. Obstacle recognition and avoidance for UAVs under resource-constrained environ-ments[J]. IEEE Access, 2020, 8: 169408-169422. doi: 10.1109/ACCESS.2020.3020632
[11]	WANG D S, LI W, LIU X G, et al. UAV environmental perception and autonomous obstacle avoidance: A deep learning and depth camera combined solution[J]. Computers and Electronics in Agriculture, 2020, 175: 105523. doi: 10.1016/j.compag.2020.105523
[12]	LIN H Y, PENG X Z. Autonomous quadrotor navigation with vision-based obstacle avoidance and path planning[J]. IEEE Access, 2021, 9: 102450-102459. doi: 10.1109/ACCESS.2021.3097945
[13]	CHEN P H, LEE C Y. UAVNet: An efficient obstacel detection model for UAV with autonomous flight[C]//IEEE. 2018 International Conference on Intelligent Autonomous Systems (ICoIAS). New York: IEEE, 2018: 217-220.
[14]	FU Q, YANG Y H, CHEN X Y, et al. Vision-based obstacle avoidance for flapping-wing aerial vehicles[J]. Science China Information Sciences, 2020, 63(7): 170208. doi: 10.1007/s11432-019-2750-y
[15]	张午阳, 章伟, 宋芳, 等. 基于深度学习的四旋翼无人机单目视觉避障方法[J]. 计算机应用, 2019, 39(4): 1001-1005. ZHANG Wu-yang, ZHANG Wei, SONG Fang, et al. Monocular vision obstacle avoidance method for quadcopter based on deep learning[J]. Journal of Computer Applica-tions, 2019, 39(4): 1001-1005.
[16]	SINGLA A, PADAKANDLA S, BHATNAGAR S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge[J]. IEEE Tran-sactions on Intelligent Transportation Systems, 2021, 22(1): 107-118. doi: 10.1109/TITS.2019.2954952
[17]	陈艺君, 余莎莎, 张学军. 城市低空场景下无人机运行对地风险量化评估[J]. 北京航空航天大学学报, 2025, 51(3): 806-815. CHEN Yi-jun, YU Sha-sha, ZHANG Xue-jun. Ground risk quantitative assessment for UAV operations in urban low-altitude scenarios[J]. Journal of Beijing University of Aero-nautics and Astronautics, 2025, 51(3): 806-815.
[18]	贺洪波, 徐晨晨, 叶虎平. 无人机低空飞行障碍物环境风险评估方法研究: 以京津新城为例[J]. 地理科学进展, 2021, 40(9): 1503-1515. HE Hong-bo, XU Chen-chen, YE Hu-ping. Environmental risk assessment of obstacles in low-altitude flight of unmanned aerial vehicle: Taking the Beijing-Tianjin New Town as an example[J]. Progress in Geography, 2021, 40(9): 1503-1515.
[19]	张钦彤, 王昱超, 王鹤羲, 等. 大语言模型微调技术的研究综述[J]. 计算机工程与应用, 2024, 60(17): 17-33. ZHANG Qin-tong, WANG Yu-chao, WANG He-xi, et al. Comprehensive review of large language model fine-tuning[J]. Computer Engineering and Applications, 2024, 60(17): 17-33.
[20]	刘学博, 户保田, 陈科海, 等. 大模型关键技术与未来发展方向: 从ChatGPT谈起[J]. 中国科学基金, 2023, 37(5): 758-766. LIU Xue-bo, HU Bao-tian, CHEN Ke-hai, et al. Key technologies and future development directions of large language models: Insights from ChatGPT[J]. Bulletin of National Natural Science Foundation of China, 2023, 37(5): 758-766.
[21]	LIU P F, YUAN W Z, FU J L, et al. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing[J]. ACM Computing Surveys, 2023, 55(9): 1-35.
[22]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. arXiv, 2017, https://arxiv.org/abs/1706.03762.
[23]	肖建力, 邱雪, 张扬, 等. 交通大模型综述[J]. 交通运输工程学报, 2025, 25(1): 8-28. doi: 10.19818/j.cnki.1671-1637.2025.01.002 XIAO Jian-li, QIU Xue, ZHANG Yang, et al. Review on large language models in transportation[J]. Journal of Traffic and Transportation Engineering, 2025, 25(1): 8-28. doi: 10.19818/j.cnki.1671-1637.2025.01.002
[24]	YIN S K, FU C Y, ZHAO S R, et al. A survey on multimodal large language models[J]. National Science Review, 2024, 11(12): nwae403. doi: 10.1093/nsr/nwae403
[25]	BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[EB/OL]. arXiv, 2005, https://arxiv.org/abs/2005.14165.
[26]	严昊, 刘禹良, 金连文, 等. 类ChatGPT大模型发展、应用和前景[J]. 中国图象图形学报, 2023, 28(9): 2749-2762. YAN Hao, LIU Yu-liang, JIN Lian-wen, et al. The develop-ment, application, and future of LLM similar to ChatGPT[J]. Journal of Image and Graphics, 2023, 28(9): 2749-2762.
[27]	刘静, 郭龙腾. GPT-4对多模态大模型在多模态理解、生成、交互上的启发[J]. 中国科学基金, 2023, 37(5): 793-802. LIU Jing, GUO Long-teng. Inspiration of GPT-4 on multi-modal foundation models in multimodal understanding, generation, and interaction[J]. Bulletin of National Natural Science Foundation ofChina, 2023, 37(5): 793-802.
[28]	吴文峻, 廖星创, 赵金琨. DeepSeek技术创新与通用人工智能发展趋势[J]. 科技导报, 2025, 43(6): 14-20. WU Wen-jun, LIAO Xing-chuang, ZHAO Jin-kun. Deep-Seek: Technological innovations and development trends toward artificial general intelligence[J]. Science & Techno-logy Review, 2025, 43(6): 14-20.
[29]	李耕, 王梓烁, 何相腾, 等. 从ChatGPT到多模态大模型: 现状与未来[J]. 中国科学基金, 2023, 37(5): 724-734. LI Geng, WANG Zi-shuo, HE Xiang-teng, et al. From ChatGPT to large multimodal model: Present and future[J]. Bulletin of National Natural Science Foundation of China, 2023, 37(5): 724-734.