兰州交通大学自动化与电气工程学院,兰州,730070
纸质出版:2025
移动端阅览
王果, 贺建山, 闵永智, 等. 基于多层次特征图谱的高压并联电抗器声纹识别模型研究[J]. 高电压技术, 2025,51(6):3030-3042.
WANG Guo, HE Jianshan, MIN Yongzhi, et al. Research on Voiceprint Recognition Model of High Voltage Shunt Reactor Based on Multi-level Feature Map[J]. 2025, 51(6): 3030-3042.
王果, 贺建山, 闵永智, 等. 基于多层次特征图谱的高压并联电抗器声纹识别模型研究[J]. 高电压技术, 2025,51(6):3030-3042. DOI: 10.13336/j.1003-6520.hve.20240640.
WANG Guo, HE Jianshan, MIN Yongzhi, et al. Research on Voiceprint Recognition Model of High Voltage Shunt Reactor Based on Multi-level Feature Map[J]. 2025, 51(6): 3030-3042. DOI: 10.13336/j.1003-6520.hve.20240640.
在高压并联电抗器声纹在线监测领域,针对长时间序列特性的电抗器声信号复杂度高、数据维度大、能量分散导致声纹识别模型信息利用率低、鲁棒性和识别率不高的问题,提出了一种基于多层次特征图谱的改进ConvNeXt-T网络声纹识别模型。首先,声信号通过点对称变换和类Gram矩阵图形化的细化频谱转换为时域、频域特征图谱,基于电抗器声纹特性提出50 Hz Gammatone滤波器组生成能量特征图谱;然后引入轻量级CA注意力机制,作为特征图谱自适应融合模块对ConvNeXt-T网络的输入侧进行改进;最后,结合实测数据验证了模型的优越性,结果表明所提模型在测试集上的平均识别准确率达97.82%,较单域图谱提升3.14%,较FCN、RsNet、ApR-IDRSN等对比模型提升6.51%,同时该模型在高斯白噪声、人声和鸟叫声环境中表现出最佳的抗噪性。该模型综合运用高维度多域特征提取方法和图形化降维表征方法,能显著提高特征丰富度和信息利用率。
In the field of voiceprint online monitoring of high-voltage shunt reactors
the acoustic signals exhibit long time-series characteristics with inherent high complexity
large data dimensions
and energy dispersion
thus problems such as the low information utilization efficiency of voiceprint recognition models
along with inadequate robustness and recognition accuracy may arise. To address these issues
an improved ConvNeXt-T network voiceprint recognition model based on multi-level feature map is proposed. Firstly
the acoustic signal is converted into time domain and frequency domain feature maps by symmetrized dot pattern and Gram-like matrix graphical refinement spectrum. Based on the characteristics of reactor voiceprint
the 50 Hz Gammatone filter banks are proposed to generate energy feature maps. Then
the lightweight CA(Coordinate Attention) attention mechanism is introduced as the feature map adaptive fusion module to improve the input side of the ConvNeXt-T network. Finally
the superiority of the model is verified by the measured data. The results show that the average recognition accuracy of the proposed model on the test set is 97.82%
which is 3.14% higher than that of the single-domain map
and 6.51% higher than that of the FCN
RsNet
ApR-IDRSN and other comparison models. At the same time
the model shows the best anti-noise performance in Gaussian white noise
human voice and bird sound environments. The model combines high-dimensional multi-domain feature extraction method and graphical dimension reduction representation method
which can significantly improve feature richness and information utilization.
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621