田嘉鹏, 宋辉, 陈立帆, 盛戈皞, 江秀臣. 面向知识图谱构建的设备故障文本实体识别方法[J]. 电网技术, 2022, 46(10): 3913-3922. DOI: 10.13335/j.1000-3673.pst.2021.1886
引用本文: 田嘉鹏, 宋辉, 陈立帆, 盛戈皞, 江秀臣. 面向知识图谱构建的设备故障文本实体识别方法[J]. 电网技术, 2022, 46(10): 3913-3922. DOI: 10.13335/j.1000-3673.pst.2021.1886
TIAN Jiapeng, SONG Hui, CHEN Lifan, SHENG Gehao, JIANG Xiuchen. Entity Recognition Approach of Equipment Failure Text for Knowledge Graph Construction[J]. Power System Technology, 2022, 46(10): 3913-3922. DOI: 10.13335/j.1000-3673.pst.2021.1886
Citation: TIAN Jiapeng, SONG Hui, CHEN Lifan, SHENG Gehao, JIANG Xiuchen. Entity Recognition Approach of Equipment Failure Text for Knowledge Graph Construction[J]. Power System Technology, 2022, 46(10): 3913-3922. DOI: 10.13335/j.1000-3673.pst.2021.1886

面向知识图谱构建的设备故障文本实体识别方法

Entity Recognition Approach of Equipment Failure Text for Knowledge Graph Construction

  • 摘要: 电力设备在运行维护中积累了大量包含重要实体信息的故障文本,然而文本实体边界模糊、术语较多等特点导致传统实体识别方法训练效率低下,效果难以提升。为此,该文提出一种新的实体识别方法I-BRC(integrated algorithm of BERT based BiRNN with CRF)。该方法采用字嵌入模型将文本逐字转化为字向量序列以避免分词处理带来的误差累积;利用循环神经网络与概率图模型对文本的序列特征信息进行抽取;集成多个单一类型实体识别器分别独立学习不同类型实体的特征并采用并行预训练机制提升算法训练效率;最后利用多类型识别器对识别结果进行整合。此外,通过调整单一类型实体识别器可以灵活机动地应对不同电力设备的实体识别任务,避免重复训练,节省计算资源。实验表明,所提出的I-BRC仅需3次迭代就可收敛,训练效率大幅度提升;且该模型的F1值、精确率、召回率分别达到了88.0%、86.8%与89.2%,相比传统模型性能提升了7.5%~29.3%,验证了所提模型的有效性与可行性。

     

    Abstract: Technicians have accumulated plenty of failure texts, which contain essential entity information, during the operation and maintenance of power equipment. However, such text has fuzzy entity boundaries and contains many professional terms, resulting in the traditional entity recognition methods with low training efficiency and poor performances. Therefore, an integrated algorithm of BERT based BiRNN with CRF (I-BRC) is proposed. This algorithm employs a word embedding model to convert each word in the text into the embedding vector sequences to avoid the error accumulation caused by word segmentation. The recurrent neural networks with probability graph models are introduced to extract sequence features from the text. The multiple single-type entity recognizers are integrated to learn the features of different entity types independently, and a parallel pre-training mechanism is employed to improve the training efficiency. Finally, the recognition results are integrated by the multi-type recognizer. Besides, adjusting the single-type entity recognizers can flexibly respond to different power equipment failure texts, avoiding repeated training and saving computation resources. Experiments show that the proposed algorithm reached a stable state after 3 iterations, which significantly improves the training efficiency with its F1 score, precision and recall as 88.0%、86.8% and 89.2% respectively. Compared with the traditional models, the performance is improved by 19.5% to 28.8%, which verifies the effectiveness and feasibility of the proposed model.

     

/

返回文章
返回