留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于桥检领域-任务迁移的检测报告信息提取少样本模型

朱彦洁 王瑜晨 熊文 蔡春声

朱彦洁, 王瑜晨, 熊文, 蔡春声. 基于桥检领域-任务迁移的检测报告信息提取少样本模型[J]. 交通运输工程学报, 2025, 25(1): 248-262. doi: 10.19818/j.cnki.1671-1637.2025.01.018
引用本文: 朱彦洁, 王瑜晨, 熊文, 蔡春声. 基于桥检领域-任务迁移的检测报告信息提取少样本模型[J]. 交通运输工程学报, 2025, 25(1): 248-262. doi: 10.19818/j.cnki.1671-1637.2025.01.018
ZHU Yan-jie, WANG Yu-chen, XIONG Wen, CAI Chun-sheng. Few-shot model for extracting inspection report information based on bridge inspection domain-task transfer[J]. Journal of Traffic and Transportation Engineering, 2025, 25(1): 248-262. doi: 10.19818/j.cnki.1671-1637.2025.01.018
Citation: ZHU Yan-jie, WANG Yu-chen, XIONG Wen, CAI Chun-sheng. Few-shot model for extracting inspection report information based on bridge inspection domain-task transfer[J]. Journal of Traffic and Transportation Engineering, 2025, 25(1): 248-262. doi: 10.19818/j.cnki.1671-1637.2025.01.018

基于桥检领域-任务迁移的检测报告信息提取少样本模型

doi: 10.19818/j.cnki.1671-1637.2025.01.018
基金项目: 

国家自然科学基金项目 52478147

国家自然科学基金项目 52378135

详细信息
    作者简介:

    朱彦洁(1994-),女,江苏宿迁人,东南大学副教授,哲学博士,从事数字桥梁研究

    通讯作者:

    熊文(1982-),男,安徽合肥人,东南大学教授,工学博士

  • 中图分类号: U446.3

Few-shot model for extracting inspection report information based on bridge inspection domain-task transfer

Funds: 

National Natural Science Foundation of China 52478147

National Natural Science Foundation of China 52378135

More Information
Article Text (Baidu Translation)
  • 摘要: 为减少桥梁检测关键信息提取方法对大量人工标注样本的依赖,提出了一种适用于少样本场景的桥梁检测关键信息提取模型,由桥检领域预训练语言模型、双向长短时记忆(BiLSTM)网络和条件随机场(CRF)组成;通过使用桥梁领域语料与检测任务数据对原始语言模型进行领域预训练和任务微调,实现从领域知识到任务特征的两阶段迁移,构建出更适应桥梁专业术语和检测报告格式的预训练语言模型;利用BiLSTM捕捉桥梁检测报告中的上下文依赖关系,并结合CRF对最终信息提取结果进行约束优化;根据行业规范和现有相关研究,重新定义了8类桥梁检测报告中通用的关键信息;为验证方法的有效性,分别在仅包含50和100个句子的少样本数据上进行训练,并在1 491个句子的测试集上评估性能。试验结果表明:当训练样本数分别为50和100个时,本文提出模型的F1值分别达到0.860 7和0.820 2,均显著优于4个主流模型,验证了该模型在少样本情况下对桥检报告关键信息的精准提取能力;消融试验进一步证明了领域与任务两阶段迁移学习策略在快速提取少样本数据中领域相关信息和任务显著特征方面的有效性,从而显著提升了模型在少样本场景上的整体性能;提出的少样本场景下桥梁检测信息提取方法可用于构建知识图谱,以评估桥梁的结构状态和预测未来可使用寿命。

     

  • 图  1  关键信息提取方法流程

    Figure  1.  Workflow of key information extraction method

    图  2  桥梁领域迁移流程

    Figure  2.  Transfer workflow of bridge domain

    图  3  信息提取任务迁移流程

    Figure  3.  Transfer workflow of information extraction task

    图  4  训练和验证过程的网络损失

    Figure  4.  Network loss during training and validation processes

    图  5  提出模型在测试集上的混淆矩阵

    Figure  5.  Confusion matrix of proposed model on test set

    图  6  提出模型与基线模型的性能对比

    Figure  6.  Performance comparison between proposed model and baseline models

    图  7  消融试验中各模型对关键桥检信息的识别性能

    Figure  7.  Recognition performances of various models on key bridge inspection information in ablation experiment

    表  1  既有研究对关键桥梁检测信息的分类

    Table  1.   Categories of key bridge inspection information from existing studies

    现有研究 年份 信息数量 从桥梁检测报告中提取关键信息的类别
    文献[23] 2022 6 桥梁构件、病害、测量值、测量单位、数量描述、严重程度描述
    文献[11] 2021 6 桥梁、桥梁构件、构件部分、病害位置、病害、病害描述
    文献[10] 2021 5 病害、严重程度定性描述、严重程度定量描述、病害位置、其他
    文献[9]、[22] 2021、2017 11 桥梁构件、病害、病害原因、维护措施、维修材料、测量值、测量值单位、数量描述、严重程度描述、日期、其他
    文献[12] 2022 3 桥梁构件、病害、原因
    下载: 导出CSV

    表  2  训练数据部分样本

    Table  2.   Partial samples of training data

    序号 数据集样本内容
    1 伸缩缝存在止水胶带破损病害1处,位于1#
    2 台身存在竖向裂缝1条,裂缝长度为2.5 m,裂缝宽度为0.2 mm,存在于Y-0#台身大里程面
    3 桥面铺装存在网向裂缝32条,分布于2根构件上,网向裂缝总长度为88.66 m
    4 小箱梁存在锈胀露筋1处,总面积为0.03 m2
    5 本次检查共发现翼墙、耳墙病害2处,均为破损
    6 整体箱梁存在蜂窝、麻面10处,分布于9根构件上,总面积为15.31 m2
    7 T梁表层混凝土存在剥落、掉角现象共12处,总面积为121.26 m2
    8 混凝土表面蜂窝麻面,共4处,总面积为14.74 m2,单处面积介于0.3~9.0 m2之间
    9 锥坡存在缺陷1处,总面积为12.0 m2
    10 小箱梁存在横向裂缝130条,分布于22根构件上,横向裂缝总长度为44.8 m
    下载: 导出CSV

    表  3  训练集、验证集、测试集中信息分布数量

    Table  3.   Number of information distributions in training sets, validation sets, and test sets

    关键信息 关键信息数量/个
    测试集 验证集 训练集
    样本量为100个 样本量为50个
    桥梁构件 1 801 49 157 84
    构件位置 729 14 87 55
    病害 3 110 76 205 109
    病害特征 888 26 73 42
    病害数量 1 646 36 90 38
    病害分布 313 8 25 13
    测量类别 976 22 74 37
    测量值 1 021 21 76 37
    总计 10 484 252 787 415
    下载: 导出CSV

    表  4  模型在不同少样本数据下的性能

    Table  4.   Model performance under different few-shot datasets

    样本量/个 精准度 召回率 F1
    50 0.798 5 0.844 8 0.820 2
    100 0.835 1 0.889 1 0.860 7
    下载: 导出CSV

    表  5  提出模型对于不同检测信息的提取性能

    Table  5.   Extraction performance of proposed model for various types of inspection information

    关键信息类别 F1 变化率/%
    样本量为50个 样本量为100个
    桥梁构件 0.278 0 0.788 2 7.66
    构件位置 0.492 8 0.621 8 20.75
    病害 0.839 9 0.875 8 4.10
    病害特征 0.768 1 0.768 7 0.08
    病害数量 0.953 0 0.970 2 1.77
    病害分布 0.981 0 0.981 0 0.00
    测量类别 0.932 9 0.975 7 4.39
    测量值 0.920 6 0.949 0 2.99
    下载: 导出CSV

    表  6  提出模型与基线模型的性能对比

    Table  6.   Performance comparison between proposed model and baseline models

    模型名称 精准度 召回率 F1 K
    样本量为50个 样本量为100个 变化率/% 样本量为50个 样本量为100个 变化率/% 样本量为50个 样本量为100个 变化率/%
    Word2Vec-CNN 0.425 0 0.468 1 9.21 0.488 6 0.520 0 6.03 0.449 5 0.487 2 7.75 0.82
    Word2Vec-BiLSTM 0.703 9 0.770 8 8.67 0.754 7 0.808 9 6.70 0.693 5 0.788 6 12.06 0.75
    Word2Vec-BiLSTM-CRF 0.685 8 0.737 5 7.02 0.761 5 0.809 2 5.89 0.718 1 0.764 1 6.01 0.95
    BERT-BiLSTM-CRF 0.803 1 0.810 7 0.94 0.682 9 0.827 8 17.51 0.738 1 0.818 6 9.83 7.05
    提出模型 0.798 5 0.835 1 4.38 0.844 8 0.889 1 4.98 0.820 2 0.860 7 4.71 1.17
    下载: 导出CSV

    表  7  消融试验中模型的性能

    Table  7.   Model performances in ablation experiment

    模型名称 精准度 召回率 F1
    样本量为50个 样本量为100个 变化率/% 样本量为50个 样本量为100个 变化率/% 样本量为50个 样本量为100个 变化率/%
    BERT-Inspection-BiLSTM 0.800 9 0.849 4 5.72 0.839 2 0.861 9 2.63 0.818 6 0.855 2 4.28
    BERT-Inspection-CRF 0.715 8 0.802 0 10.70 0.782 7 0.881 3 11.20 0.746 2 0.839 8 11.10
    BERT-BiLSTM-CRF 0.803 1 0.810 7 0.94 0.682 9 0.827 8 17.51 0.738 1 0.818 6 9.83
    BERT-Inspection (no domain)-BiLSTM-CRF 0.780 2 0.821 8 5.06 0.791 6 0.862 2 8.19 0.783 9 0.840 9 6.78
    BERT-Inspection-BiLSTM-CRF 0.798 5 0.835 1 4.38 0.844 8 0.889 1 4.98 0.820 2 0.860 7 4.71
    下载: 导出CSV

    表  8  提出模型在消融试验中对不同关键信息的F1

    Table  8.   F1 scores for different key information achieved by proposed model in ablation experiment

    关键信息 BERT-Inspection-BiLSTM BERT-Inspection-CRF BERT-BiLSTM-CRF BERT-Inspection (no domain)-BiLSTM-CRF BERT-Inspection-BiLSTM-CRF
    桥梁构件 0.776 7 0.757 9 0.735 0 0.735 6 0.788 2
    构件位置 0.494 8 0.567 6 0.545 3 0.492 1 0.621 8
    病害 0.889 1 0.864 3 0.870 1 0.860 2 0.875 8
    病害特征 0.803 5 0.757 3 0.773 2 0.782 0 0.768 7
    病害数量 0.957 3 0.981 0 0.962 4 0.973 7 0.970 2
    病害分布 0.974 5 0.952 2 0.193 7 0.982 5 0.981 0
    测量类别 0.962 7 0.979 0 0.968 0 0.973 5 0.975 7
    测量值 0.968 3 0.962 8 0.947 3 0.958 0 0.949 0
    下载: 导出CSV

    表  9  提出模型在5个实例的信息提取结果

    Table  9.   Information extraction results of proposed model in five instances

    实例1 文本输入 主拱圈存在横向裂缝1条,裂缝长度为1.1 m,裂缝宽度为0.12 mm
    模型输出 [B-be, I-be, I-be, O, O, B-dd, I-dd, B-d, I-d, B-q, I-q, O, B-d, I-d, B-mc, I-mc, O, B-m, I-m, I-m, I-m, O, B-d, I-d, B-mc, I-mc, O, B-m, I-m, I-m, I-m, I-m, O]
    提取结果 桥梁构件为主拱圈;病害特征为横向;病害为裂缝;病害数量为1条;测量类别为长度,测量值为1.1 m;测量类别为宽度,测量值为0.12 mm
    实例2 文本输入 湿接缝存在剥落、掉角6处,分布于6根构件上,总面积为0.85 m2
    模型输出 [B-be, I-be, I-be, O, O, B-d, I-d, O, B-d, I-d, B-q, I-q, O, O, O, O, B-s, I-s, O, O, O, O, O, B-mc, I-mc, O, B-m, I-m, I-m, I-m, I-m, I-m, O]
    提取结果 桥梁构件为湿接缝;病害1为剥落;病害2为掉角;病害数量为6处;病害分布为6根;测量类别为面积,测量值为0.85 m2
    实例3 文本输入 支座存在串动和脱空58个,其中脱空58个,脱空率介于15.0%~50.0%
    模型输出 [B-be, I-be, O, O, B-d, I-d, O, B-d, I-d, B-q, I-q, O, O, O, B-d, I-d, B-q, I-q, O, B-d, I-d, I-d, O, O, B-m, I-m, I-m, I-m, I-m, I-m, I-m, I-m, I-m, O]
    提取结果 桥梁构件为支座;病害为串动;病害为脱空;病害数量为58个;测量类别为脱空率,测量值为15.0%~50.0%
    实例4 文本输入 排水系统存在排水孔堵塞、排水不畅现象,共3处
    模型输出 [B-be, I-be, I-be, I-be, O, O, B-d, I-d, I-d, I-d, I-d, O, B-d, I-d, I-d, I-d, O, O, O, O, B-q, I-q, O]
    提取结果 桥梁构件为排水系统;病害①为排水孔堵塞;病害②为排水不畅;病害数量为3处
    实例5 文本输入 主塔纵向阻尼器存在油漆脱落现象,共1处,面积为0.02 m2
    模型输出 [B-be, I-be, B-dd, I-dd, B-d, I-d, I-be, O, O, B-dd, I-dd, B-d, I-d, O, O, O, O, B-q, I-q, O, B-mc, I-mc, O, B-m, I-m, I-m, I-m, I-m, I-m, O]
    提取结果 桥梁构件为主塔;构件位置为纵向阻尼器;病害特征为油漆;病害为脱落;病害数量为1处;测量类别为面积,测量值为0.02 m2
    下载: 导出CSV
  • [1] XIA Ye, LEI Xiao-ming, WANG Peng, et al. A data-driven approach for regional bridge condition assessment using inspection reports[J]. Structural Control and Health Monitoring, 2022, 29(4): e2915.
    [2] YANG Jian-xi, YANG Xiao-xia, LI Ren, et al. BERT and hierarchical cross attention-based question answering over bridge inspection knowledge graph[J]. Expert Systems with Applications, 2023, 233: 120896.
    [3] JIANG Ya-li, YANG Gang, LI Hai-jiang, et al. Knowledge driven approach for smart bridge maintenance using big data mining[J]. Automation in Construction, 2023, 146: 104673. doi: 10.1016/j.autcon.2022.104673
    [4] YANG Jian-xi, XIANG Fang-yue, LI Ren, et al. Intelligent bridge management via big data knowledge engineering[J]. Automation in Construction, 2022, 135: 104118.
    [5] KALE A, RICKS B, GANDHI R. New measure to understand and compare bridge conditions based on inspections time-series data[J]. Journal of Infrastructure Systems, 2021, 27(4): 04021037. doi: 10.1061/(ASCE)IS.1943-555X.0000633
    [6] WANG Y C, CAI C S, HAN B, et al. A deep learning-based approach for assessment of bridge condition through fusion of multi-type inspection data[J]. Engineering Applications of Artificial Intelligence, 2024, 128: 107468.
    [7] YANG Xiao-xia, YANG Jian-xi, LI Ren, et al. Complex knowledge base question answering for intelligent bridge management based on multi-task learning and cross-task constraints[J]. ENTROPY, 2022, 24(12): 1805.
    [8] LIU Heng, ZHANG Yun-feng. Bridge condition rating data modeling using deep learning algorithm[J]. Structure and Infrastructure Engineering, 2020, 16(10): 1447-1460.
    [9] LIU Kai-jian, EL-GOHARY N. Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports[J]. Automation in Construction, 2017, 81: 313-327. doi: 10.1016/j.autcon.2017.02.003
    [10] LI T S, ALIPOUR M, HARRIS D K. Context-aware sequence labeling for condition information extraction from historical bridge inspection reports[J]. Advanced Engineering Informatics, 2021, 49: 101333. doi: 10.1016/j.aei.2021.101333
    [11] LI Ren, MO Tian-jin, YANG Jian-xi, et al. Bridge inspection named entity recognition via bert and lexicon augmented machine reading comprehension neural model[J]. Advanced Engineering Informatics, 2021, 50: 101416. doi: 10.1016/j.aei.2021.101416
    [12] MOON S, CHUNG S, CHI S. Bridge damage recognition from inspection reports using NER based on recurrent neural network with active learning[J]. Journal of Performance of Constructed Facilities, 2020, 34(6): 04020119. doi: 10.1061/(ASCE)CF.1943-5509.0001530
    [13] WANG X Y, EL-GOHARY N. Deep learning-based named entity recognition and resolution of referential ambiguities for enhanced information extraction from construction safety regulations[J]. Journal of Computing in Civil Engineering, 2023, 37(5): 04023023. doi: 10.1061/(ASCE)CP.1943-5487.0001064
    [14] ZHANG R C, EL-GOHARY N. A deep neural network-based method for deep information extraction using transfer learning strategies to support automated compliance checking[J]. Automation in Construction, 2021, 132: 103834. doi: 10.1016/j.autcon.2021.103834
    [15] 贺拴海, 王安华, 朱钊, 等. 公路桥梁智能检测技术研究进展[J]. 中国公路学报, 2021, 34(12): 12-24.

    HE Shuan-hai, WANG An-hua, ZHU Zhao, et al. Research progress on intelligent detection technologies of highway bridges[J]. China Journal of Highway and Transport, 2021, 34 (12): 12-24.
    [16] MOON S, LEE G, CHI S. Automated system for construction specification review using natural language processing[J]. Advanced Engineering Informatics, 2022, 51: 101495. doi: 10.1016/j.aei.2021.101495
    [17] LI Chuan-jiang, LI Shao-bo, WANG Huan, et al. Attention-based deep meta-transfer learning for few-shot fine-grained fault diagnosis[J]. Knowledge-Based Systems, 2023, 264: 110345. doi: 10.1016/j.knosys.2023.110345
    [18] ZHANG Hu, GUO Jia-yu, WANG Yu-jie, et al. Judicial nested named entity recognition method with mrc framework[J]. International Journal of Cognitive Computing in Engineering, 2023, 4: 118-126. doi: 10.1016/j.ijcce.2023.03.002
    [19] MOIRANGTHEM D S, LEE M. Hierarchical and lateral multiple timescales gated recurrent units with pre-trained encoder for long text classification[J]. Expert Systems with Applications, 2021, 165: 113898. doi: 10.1016/j.eswa.2020.113898
    [20] HU Zhong-jian, YANG Peng, LI Bing, et al. Biomedical extractive question answering based on dynamic routing and answer voting[J]. Information Processing and Management, 2023, 60(4): 103367. doi: 10.1016/j.ipm.2023.103367
    [21] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. arXiv, 2017, DOI: 10.48550/arXiv.1810.04805.
    [22] LIU K J, EL-GOHARY N. Semantic neural network ensemble for automated dependency relation extraction from bridge inspection reports[J]. Journal of Computing in Civil Engineering, 2021, 35(4): 04021007. doi: 10.1061/(ASCE)CP.1943-5487.0000961
    [23] LIU K J, EL-GOHARY N. Improved similarity assessment and spectral clustering for unsupervised linking of data extracted from bridge inspection reports[J]. Advanced Engineering Informatics, 2022, 51: 101496. doi: 10.1016/j.aei.2021.101496
    [24] 劳武略, 崔闯, 张登科, 等. 基于计算机视觉的钢桥面板裂纹识别方法[J]. 中国公路学报, 2023, 36(3): 188-201. doi: 10.3969/j.issn.1001-7372.2023.03.016

    LAO Wu-lüe, CUI Chuang, ZHANG Deng-ke, et al. Crack identification method of orthotropic steel deck based on computer vision[J]. China Journal of Highway and Transport, 2023, 36 (3): 188-201. doi: 10.3969/j.issn.1001-7372.2023.03.016
    [25] 景强, 郑顺潮, 梁鹏, 等. 港珠澳大桥智能化运维技术与工程实践[J]. 中国公路学报, 2023, 36(6): 143-156.

    JING Qiang, ZHENG Shun-chao, LIANG Peng, et al. Technologies and engineering practices of intelligent operation and maintenance of Hong Kong-Zhuhai-Macao Bridge[J]. China Journal of Highway and Transport, 2023, 36(6): 143-156.
    [26] VO S N, VO T T, LE B. Interpretable extractive text summarization with meta-learning and bi-lstm: a study of meta learning and explainability techniques[J]. Expert Systems with Applications, 2024, 245: 123045.
    [27] LI Dai-yi, YAN Li, YANG Jian-zhong, et al. Dependency syntax guided BERT-BiLSTM-GAM-CRF for chinese NER[J]. Expert Systems with Applications, 2022, 196: 116682.
    [28] ZHOU Shang-lian, SONG Wei. Deep learning-based roadway crack classification using laser-scanned range images: a comparative study on hyperparameter selection[J]. Automation in Construction, 2020, 114: 103171.
    [29] LAMBRECHTS G, DE GEETER F, VECOVEN N, et al. Warming up recurrent neural networks to maximise reachable multistability greatly improves learning[J]. Neural Networks, 2023, 166: 645-669.
    [30] LIU Ya-fei, WEI Si-qi, HUANG Hai-jun, et al. Naming entity recognition of citrus pests and diseases based on the BERT-BiLSTM-CRF model[J]. Expert Systems with Applications, 2023, 234: 121103.
  • 加载中
图(7) / 表(9)
计量
  • 文章访问数:  21
  • HTML全文浏览量:  20
  • PDF下载量:  7
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-01-30
  • 刊出日期:  2025-02-25

目录

    /

    返回文章
    返回