Abstract:
With the digital transformation and upgrade of the power grids, the intelligent operation and maintenance technology of the power equipment has developed rapidly. During the operation and maintenance process, a large number of defect texts containing important information of the power grids have been accumulated. Due to the sparseness of text data labels, as well as the fuzziness and diversity of the literal descriptions, it is difficult to effectively mine the operation and maintenance information in power texts. A data augmentation of the defect texts for the power equipment is proposed. Firstly, the defect text data sets are used to fine-tune the pre-training model ERNIE(enhanced representation through knowledge integration)with the multi-stage knowledge mask strategy, integrating electrical expertise into dynamic encoding of defect texts. Secondly, on the basis of manifold assumption, the destruction and reconstruction functions are designed based on the denoising autoencoder. The destruction function is constructed according to the mask unit selection strategy based on the information value, and the reconstruction function is constructed based on the fine-tuned ERNIE. The enhanced samples are obtained during the process of the destruction and reconstruction. Then, the augmented data is selected based on the influence function and the diversity measures, filtering out the samples with poor data quality and high repetition. Finally, the augmented data is applied to various text mining tasks through a multi-layer training framework. Results show that the algorithm is able to greatly improve the effect of the defect text mining, and can be widely and flexibly applied in a variety of power equipment defect text mining tasks.