Abstract:
Accurate classification of insulator defects in inspection images is one of the key technologies in the field of automatic inspection of transmission lines. To address the issue of the insufficient utilization of textual information by traditional deep learning classification methods and the issue of relatively simplistic insulator image classification labels, this paper proposes for the first time a multi-label classification method for defective insulator images based on a multimodal approach. Firstly, a multimodal joint data augmentation method is employed, achieving cross-modal data enhancement between insulator images and label texts. Then, the Vision Transformer network is utilized to extract features from images, and the BERT network is used to extract features from label texts, fully leveraging the feature information from both images and label texts to obtain comprehensive information from different modalities, thereby enhancing the network's classification capabilities. Finally, through correlating the feature information of images and texts via contrastive learning, the reliability of network classification is enhanced, while also providing good interpretability for the classification results. The experimental results demonstrate that this method achieves an overall accuracy rate of 93.87%, showing a significant advantage in classification performance over other models on the same dataset.