蒋晨, 王渊, 胡俊华, 徐积全, 陈珉, 王雅雯, 马国明. 基于深度学习的电力实体信息识别方法[J]. 电网技术, 2021, 45(6): 2141-2149. DOI: 10.13335/j.1000-3673.pst.2020.1678
引用本文: 蒋晨, 王渊, 胡俊华, 徐积全, 陈珉, 王雅雯, 马国明. 基于深度学习的电力实体信息识别方法[J]. 电网技术, 2021, 45(6): 2141-2149. DOI: 10.13335/j.1000-3673.pst.2020.1678
JIANG Chen, WANG Yuan, HU Junhua, XU Jiquan, CHEN Min, WANG Yawen, MA Guoming. Power Entity Information Recognition Based on Deep Learning[J]. Power System Technology, 2021, 45(6): 2141-2149. DOI: 10.13335/j.1000-3673.pst.2020.1678
Citation: JIANG Chen, WANG Yuan, HU Junhua, XU Jiquan, CHEN Min, WANG Yawen, MA Guoming. Power Entity Information Recognition Based on Deep Learning[J]. Power System Technology, 2021, 45(6): 2141-2149. DOI: 10.13335/j.1000-3673.pst.2020.1678

基于深度学习的电力实体信息识别方法

Power Entity Information Recognition Based on Deep Learning

  • 摘要: 电力设备在长期运行与维护过程中积累了大量电力文本,文本中含有常见的故障部位、故障现象与故障检修方法,由于电力文本常采用非结构化的形式进行记录,所以电力信息的自动挖掘难以准确实现。提出了一种新的电力实体信息识别方法(PowerBERT+Bi-LSTM+CRF,PBERTBiLC)。该方法首先采用预训练方式对通用BERT进行参数初置,形成PowerBERT (电力BERT),再将PowerBERT作为文本的字向量语义编码层,以Bi-LSTM作为字符实体信息标签预测层,CRF作为全局标签优化层,共同构建电力实体信息识别模型,实现了电力文本信息的高准确率识别。对现场560份电力设备故障检修文本进行实体识别,在不同实体信息类别上,基于PBERTBiLC的实体信息识别方法比基于词典和最大后向匹配算法的F1值高15.75%~34.38%;且比目前常用的word2vec+Bi-LSTM+CRF的F1值高2.33%~11.25%。

     

    Abstract: There have been a large number of field cases accumulated in the power equipment during the long-term operation and maintenance which contain common fault locations, fault phenomena and troubleshooting methods. Since the power texts are often recorded in unstructured forms, it is not easy to automatically and accurately mine the electric power information. This paper proposes a new power entity information recognition method (PowerBERT+Bi-LSTM+CRF, PBERT- BiLC). This method firstly initializes the parameters of the general BERT by using the pre-training method to form PowerBERT (electric BERT). Then the PowerBERT, taken as the word vector semantic encoding layer, Bi-LSTM, as the character entity information label prediction layer, and CRF, as the global label optimization layer, are combined to build a power entity information extraction model with high accuracy extraction of power text information. Here 560 electrical equipment troubleshooting texts on site are taken to have the entity information extraction. The results show that the proposed entity information extraction method based on BERT dynamic word vector embedding is 15.75% to 34.38% higher than the F1 value based on the dictionary and the maximum backward matching algorithm. It is 2.33% to 11.25% higher than the F1 value of word2vec+Bi-LSTM+CRF commonly used at present.

     

/

返回文章
返回