周景, 王满意, 田兆星. 基于多模态的缺陷绝缘子图像的多标签分类[J]. 高电压技术, 2025, 51(2): 642-651. DOI: 10.13336/j.1003-6520.hve.20232230
引用本文: 周景, 王满意, 田兆星. 基于多模态的缺陷绝缘子图像的多标签分类[J]. 高电压技术, 2025, 51(2): 642-651. DOI: 10.13336/j.1003-6520.hve.20232230
ZHOU Jing, WANG Manyi, TIAN Zhaoxing. Multi-label Classification of Defective Insulator Images Based on Multimodality[J]. High Voltage Engineering, 2025, 51(2): 642-651. DOI: 10.13336/j.1003-6520.hve.20232230
Citation: ZHOU Jing, WANG Manyi, TIAN Zhaoxing. Multi-label Classification of Defective Insulator Images Based on Multimodality[J]. High Voltage Engineering, 2025, 51(2): 642-651. DOI: 10.13336/j.1003-6520.hve.20232230

基于多模态的缺陷绝缘子图像的多标签分类

Multi-label Classification of Defective Insulator Images Based on Multimodality

  • 摘要: 对巡检图像中绝缘子缺陷准确分类是输电线路自动巡检领域中的关键技术之一。针对传统深度学习的分类方法对文本信息利用不够充分以及绝缘子图像分类标签较为单一的问题,该文首次提出了一种基于多模态的缺陷绝缘子图像的多标签分类方法。首先,采用一种多模态联合数据增强方法,实现了绝缘子图像和标签文本间跨模态的数据增强。然后,使用Vision Transformer网络提取图像的特征信息和BERT网络提取标签文本的特征信息,充分利用图像和标签文本的特征信息,从不同模态获取全面的信息,提高了网络的分类能力。最后,通过对比学习的方式将图像和文本的特征信息关联,增强网络分类的可靠性的同时,又为分类结果提供了良好的可解释性。实验结果表明,该方法的分类总体准确率达到93.87%,在同一数据集中对比其他模型,分类性能具有明显优势,为多模态技术在电网领域的应用提供了较好的基础。

     

    Abstract: Accurate classification of insulator defects in inspection images is one of the key technologies in the field of automatic inspection of transmission lines. To address the issue of the insufficient utilization of textual information by traditional deep learning classification methods and the issue of relatively simplistic insulator image classification labels, this paper proposes for the first time a multi-label classification method for defective insulator images based on a multimodal approach. Firstly, a multimodal joint data augmentation method is employed, achieving cross-modal data enhancement between insulator images and label texts. Then, the Vision Transformer network is utilized to extract features from images, and the BERT network is used to extract features from label texts, fully leveraging the feature information from both images and label texts to obtain comprehensive information from different modalities, thereby enhancing the network's classification capabilities. Finally, through correlating the feature information of images and texts via contrastive learning, the reliability of network classification is enhanced, while also providing good interpretability for the classification results. The experimental results demonstrate that this method achieves an overall accuracy rate of 93.87%, showing a significant advantage in classification performance over other models on the same dataset.

     

/

返回文章
返回