Abstract:
To overcome the challenges of knowledge acquisition brought by the specialization and interdisciplinary characteristics of electric power science and technology texts, a power technology language model is proposed to achieve a more accurate text representation. The Transformer-based language model is pre-trained on large-scale power technology papers, patents, projects, and other texts. Two tasks including power science and technology term classification and distantly supervised entity relation extraction are proposed for verifying the model. Experiment results show that the F1-score of the proposed domain language model on the term classification task is more than 10% higher than that of the word2vec benchmark model, and the AUC score on the entity relation extraction task is about 2% higher than the BERT benchmark model. The proposed language model is beneficial to provide higher-quality feature representations for downstream knowledge acquisition tasks.