张珂, 郑朝烨, 石超君, 赵振兵, 肖扬杰. 基于多模态对比学习的输电线路螺栓缺陷分类[J]. 高电压技术, 2025, 51(2): 630-641. DOI: 10.13336/j.1003-6520.hve.20232123
引用本文: 张珂, 郑朝烨, 石超君, 赵振兵, 肖扬杰. 基于多模态对比学习的输电线路螺栓缺陷分类[J]. 高电压技术, 2025, 51(2): 630-641. DOI: 10.13336/j.1003-6520.hve.20232123
ZHANG Ke, ZHENG Zhaoye, SHI Chaojun, ZHAO Zhenbing, XIAO Yangjie. Transmission Line Bolt Defects Classification Based on Multi-modal Contrastive Learning[J]. High Voltage Engineering, 2025, 51(2): 630-641. DOI: 10.13336/j.1003-6520.hve.20232123
Citation: ZHANG Ke, ZHENG Zhaoye, SHI Chaojun, ZHAO Zhenbing, XIAO Yangjie. Transmission Line Bolt Defects Classification Based on Multi-modal Contrastive Learning[J]. High Voltage Engineering, 2025, 51(2): 630-641. DOI: 10.13336/j.1003-6520.hve.20232123

基于多模态对比学习的输电线路螺栓缺陷分类

Transmission Line Bolt Defects Classification Based on Multi-modal Contrastive Learning

  • 摘要: 输电线路巡检中采集的螺栓图像有分辨率低、视觉信息不足的特点。针对传统图像分类模型难以从螺栓图像中学习到语义丰富的视觉表征问题,提出了一种基于多模态对比学习的输电线路螺栓缺陷分类方法。首先,为了将文本中螺栓相关的语义信息和先验知识以跨模态的方式注入视觉表征,提出了一种结合多模态对比预训练和监督式微调的二阶段训练算法;其次,为了缓解多模态对比预训练中的过拟合问题,提出了标签平滑的信息噪声对比估计损失(info noise contrastive estimation loss with label smoothing,infoNCE-LS),以提高预训练视觉表征的泛化性能;最后,针对上下游任务的不匹配问题,设计了3种基于文本提示的分类头,以改善预训练视觉表征在监督式微调阶段的迁移学习效果。实验结果表明:该文基于ResNet50和ViT构建的两种模型在螺栓缺陷分类数据集上的准确率分别为92.3%和97.4%,相比基线分别提高了2.4%和5.8%。研究实现了从文本到图像的语义信息跨模态补充,为螺栓缺陷识别的研究提供了新的思路。

     

    Abstract: Bolt images collected in transmission line inspection have the characteristics of low resolution and insufficient visual information. To solve the problem that traditional image classification models struggle to learn rich-semantic visual representations from bolt images, this paper proposes a method of bolt defect classification based on multi-modal contrastive learning. Firstly, in order to inject bolt-related semantic information and prior knowledge into the visual representation in a cross-modal manner, a two-stage training algorithm which combines the multi-modal contrastive pre-training and supervised fine-tuning is proposed. Secondly, to alleviate the overfitting in multi-modal contrastive pre-training, the info noise contrastive estimation loss with label smoothing (infoNCE-LS) is proposed to improve the generalization of the pre-trained visual representation. Finally, aimed at the mismatch between the upstream and downstream tasks, three types of classification heads based on text prompts are designed to improve the transfer learning performance of the pre-trained visual representation in the supervised fine-tuning stage. The experimental results show that the accuracy of two models based on ResNet50 and ViT on the bolt defect classification dataset is 92.3% and 97.4%, which is 2.4% and 5.8% higher than the baseline. The study realizes the cross-modal supplement of semantic information from text to image, which provides a new idea for the research of bolt defect identification.

     

/

返回文章
返回