Abstract:
In recent years, the importance of high-quality development and digital transformation of the power industry has gradually become prominent, which puts forward new requirements for the digital transformation research of power standards, and also brings new challenges and opportunities for the management, implementation and supervision of power standards. As an important support for social and economic development, the terminology and proper nouns in the field of electric power have high specificity and complexity, and the traditional named entity recognition method based on rule and feature engineering has the limitations of low recognition accuracy, difficult to separate terms, and relying on expert experience when dealing with standard documents in the field of electric power. In order to overcome these problems, this paper proposes an improved BERT named entity recognition model. By introducing the power term corpus, word features and lexical information in the field, 10 kinds of power entities are identified on the power standard corpus, and F1 reaches 81%, which realizes the effective identification of long term entities in the electric power field, improves the processing efficiency and accuracy of power standard documents, and provides support for the information processing and application of power standards. Through the research of this paper, it can promote the automatic processing ability of power standard documents, improve the digitalization level of the power industry, and provide strong technical support for the specification formulation, knowledge management and decision support of the power industry.