Abstract:
Technicians have accumulated plenty of failure texts, which contain essential entity information, during the operation and maintenance of power equipment. However, such text has fuzzy entity boundaries and contains many professional terms, resulting in the traditional entity recognition methods with low training efficiency and poor performances. Therefore, an integrated algorithm of BERT based BiRNN with CRF (I-BRC) is proposed. This algorithm employs a word embedding model to convert each word in the text into the embedding vector sequences to avoid the error accumulation caused by word segmentation. The recurrent neural networks with probability graph models are introduced to extract sequence features from the text. The multiple single-type entity recognizers are integrated to learn the features of different entity types independently, and a parallel pre-training mechanism is employed to improve the training efficiency. Finally, the recognition results are integrated by the multi-type recognizer. Besides, adjusting the single-type entity recognizers can flexibly respond to different power equipment failure texts, avoiding repeated training and saving computation resources. Experiments show that the proposed algorithm reached a stable state after 3 iterations, which significantly improves the training efficiency with its F1 score, precision and recall as 88.0%、86.8% and 89.2% respectively. Compared with the traditional models, the performance is improved by 19.5% to 28.8%, which verifies the effectiveness and feasibility of the proposed model.