李元, 李睿, 林金山, 金凌峰, 邵先军, 张冠军. 基于字词混用集成模型的电力变压器缺陷记录文本挖掘方法[J]. 电力工程技术, 2024, 43(6): 153-162. DOI: 10.12158/j.2096-3203.2024.06.015
引用本文: 李元, 李睿, 林金山, 金凌峰, 邵先军, 张冠军. 基于字词混用集成模型的电力变压器缺陷记录文本挖掘方法[J]. 电力工程技术, 2024, 43(6): 153-162. DOI: 10.12158/j.2096-3203.2024.06.015
LI Yuan, LI Rui, LIN Jinshan, JIN Lingfeng, SHAO Xianjun, ZHANG Guanjun. Character-word level ensemble integrated model for power transformer defect recording text mining method[J]. Electric Power Engineering Technology, 2024, 43(6): 153-162. DOI: 10.12158/j.2096-3203.2024.06.015
Citation: LI Yuan, LI Rui, LIN Jinshan, JIN Lingfeng, SHAO Xianjun, ZHANG Guanjun. Character-word level ensemble integrated model for power transformer defect recording text mining method[J]. Electric Power Engineering Technology, 2024, 43(6): 153-162. DOI: 10.12158/j.2096-3203.2024.06.015

基于字词混用集成模型的电力变压器缺陷记录文本挖掘方法

Character-word level ensemble integrated model for power transformer defect recording text mining method

  • 摘要: 变压器运维管理中积累了海量以文本形式记录的非结构化缺陷数据,但缺乏有效挖掘手段导致其利用率极低。文中提出一种基于字词混用集成模型的变压器缺陷记录文本挖掘方法,首先对变压器缺陷文本进行文本分词、去除停用词、文本增强、文本特征表示等预处理,以文本数学向量形式为输入,集成多个词汇级和字符级分类模型,通过元学习器对各基学习器性能的协同互补作用,实现变压器缺陷类型的准确识别和分类。与单一文本分类算法相比,该方法能够更全面地获得文本的语义特征,分类精确率达91%,模型准确率和召回率的综合评价分数F1=0.9。将自然语言处理技术应用于电力设备缺陷记录文本,可以实现精准高效分类和故障识别,唤醒数据资源,显著提升电力变压器智能化管理水平。

     

    Abstract: The operation and maintenance management of transformers has accumulated a large amount of unstructured defect recording data in the form of text. However, the lack of effective mining method has led to an extremely low utilization rate. A text mining method for transformer defect recording text based on a character-word level ensemble integrated model is proposed in this paper. Firstly, the transformer defect recording texts are preprocessed with text segmentation, stop word removal, text augmentation, and text feature representation to convert the data into mathematical vectors for input. By integrating multiple word- and character-level classification models, the method can realize accurate identification and classification of transformer defect types through the synergistic and complementary effects of meta-learners on the individual base learners. Compared to single-text classification algorithms, this method can obtain the semantic features of the text more comprehensively, achieving a classification precision of 91% and F1 score of 0.9, which is the comprehensive evaluation score for model precision and recall. By applying natural language processing technology to precise power equipment defect recoding text classification and efficient fault recognition, data resources are awakened, and the intelligent management level of power transformers is significantly improved.

     

/

返回文章
返回