基于不均衡数据集成学习的大型电力变压器状态评价方法

韩笑; 王新迎; 韩帅; 张玉天; 王继业

doi:10.13335/j.1000-3673.pst.2019.2180

基于不均衡数据集成学习的大型电力变压器状态评价方法

Ensemble Learning Method for Large-Scale Power Transformer Status Evaluation Based on Imbalanced Data

摘要

摘要: 大型电力变压器构造复杂，设备成本高，是电力系统的关键组成部分，其运行状态与电力系统的安全稳定密切相关，因此变压器状态评价已成为常态运维业务工作。然而目前状态评价工作严重依赖导则与专家经验，人力成本高，易受到主观影响；而已有模型往往直接应用标准算法，在实际生产环境中表现不佳。针对目前大型电力变压器状态评价在数据质量、样本分布、应用需求与模型表现等方面存在的问题，提出了一种新的评价模型。首先，剔除了无效样本并设计了一种交叉权重方法来标记有效样本；之后，按照数据完整程度区分状态量并对其进行特征提取和高维映射，再拆分数据集得到多个完备训练数据集；然后，应用SMOTE- BORDERLINE算法来合成正样本，得到多个完备均衡训练数据集；最后，并行化训练多个代价敏感修正的支持向量机(support vector machine，SVM)组件学习器，并通过权重投票法形成集成学习器。所提出的模型考虑了不均衡数据集与代价敏感所带来的影响，利用集成学习提高了模型的泛化能力，经过实际生产环境验证表现良好，与传统方法相比，显著降低了非正常状态样本的误判率与漏判率。

Abstract: Large-scale power transformers have complex structure and high equipment cost, and is a key component of power system, whose safety and reliability are closely related to the operation status of power transformers. Therefore, the transformer status evaluation has become a common operation and maintenance business. However, the current status evaluation work relies heavily on guidelines and expert experience, which is of high labor cost and vulnerable to subjective influence; while the existing evaluation models commonly apply standard algorithms and perform poorly in production environment. This paper proposes a new evaluation model to solve the existing problems in data quality, sample distribution, application requirements and model performance in the large-scale power transformer status evaluation. Firstly, the invalid samples are eliminated and a cross weight method is designed to label the raw valid data. Secondly, the processed status data are distinguished according to their integrity, and then feature extraction and high-dimensional mapping are performed, then the dataset is split into multiple complete training datasets. Thirdly, the SMOTE-BORDERLINE algorithm is applied to synthesize positive samples and provide multiple complete balanced training datasets. Finally, multiple SVM component learners modified with cost sensitive requirements are trained in parallel, which are then integrated into an ensemble learner by weighted voting method. The model proposed in this paper effectively utilizes ensemble learning method to improve generalization ability with the impact of the imbalanced datasets and the cost sensitivity. It is verified a good performance in the production environment. Compared with traditional methods, it significantly reduces the false prediction rate and the missing rate of abnormal status samples.

HTML全文

参考文献(24)

施引文献

资源附件(0)