Abstract:
There have been a large number of field cases accumulated in the power equipment during the long-term operation and maintenance which contain common fault locations, fault phenomena and troubleshooting methods. Since the power texts are often recorded in unstructured forms, it is not easy to automatically and accurately mine the electric power information. This paper proposes a new power entity information recognition method (PowerBERT+Bi-LSTM+CRF, PBERT- BiLC). This method firstly initializes the parameters of the general BERT by using the pre-training method to form PowerBERT (electric BERT). Then the PowerBERT, taken as the word vector semantic encoding layer, Bi-LSTM, as the character entity information label prediction layer, and CRF, as the global label optimization layer, are combined to build a power entity information extraction model with high accuracy extraction of power text information. Here 560 electrical equipment troubleshooting texts on site are taken to have the entity information extraction. The results show that the proposed entity information extraction method based on BERT dynamic word vector embedding is 15.75% to 34.38% higher than the F1 value based on the dictionary and the maximum backward matching algorithm. It is 2.33% to 11.25% higher than the F1 value of word2vec+Bi-LSTM+CRF commonly used at present.