田雪涵, 董坤, 赵剑锋, 郭希瑞. 基于增强优化预训练语言模型的电力数据实体识别方法[J]. 智慧电力, 2024, 52(6): 100-107.
引用本文: 田雪涵, 董坤, 赵剑锋, 郭希瑞. 基于增强优化预训练语言模型的电力数据实体识别方法[J]. 智慧电力, 2024, 52(6): 100-107.
TIAN Xue-han, DONG Kun, ZHAO Jian-feng, GUO Xi-rui. Entity Recognition Method for Power Data Based on Enhanced Optimization Pre-trained Language Model[J]. Smart Power, 2024, 52(6): 100-107.
Citation: TIAN Xue-han, DONG Kun, ZHAO Jian-feng, GUO Xi-rui. Entity Recognition Method for Power Data Based on Enhanced Optimization Pre-trained Language Model[J]. Smart Power, 2024, 52(6): 100-107.

基于增强优化预训练语言模型的电力数据实体识别方法

Entity Recognition Method for Power Data Based on Enhanced Optimization Pre-trained Language Model

  • 摘要: 知识图谱可有效整合电力系统中的多源数据,提升电网的知识管理水平。针对电力文本数据集稀缺、实体类型多样、专业性强的特点,提出1种基于增强优化预训练语言模型的电力数据实体识别方法。该方法使用实体词袋替换的数据增强技术扩大原始数据集,采用增强优化预训练语言模型(RoBERTa)进行动态语义编码,利用双向长短期记忆网络(BiLSTM)和条件随机场(CRF)提取特征并优化标签。实验结果表明,该实体识别方法比传统基于深度学习的实体识别方法的平均数指标F1分数高2.17%,证实其对构建电力数据知识图谱的识别效果。

     

    Abstract: Knowledge graph can effectively integrate multi-source data in the power system,improve the level of grid knowledge management. In light of the scarcity of power datasets,diverse entity types and strong professionalism,a method for power data entity recognition based on enhanced optimization pre-trained language model is proposed. This method utilizes data augmentation techniques based on entity word bags to expand the original dataset,employs enhanced optimization pre-trained language model for dynamic semantic encoding,and utilizes bidirectional long short term memory networks and conditional random fields to extract features and optimize labels. Experimental results demonstrate that this entity recognition method outperforms traditional deep learning-based entity recognition methods by 2.17% in F1 score,its effectiveness is confirmed in constructing knowledge graphs for power data.

     

/

返回文章
返回