Abstract:
To address the problem that the edge devices with limited computational resources cannot support complex and high-precision image recognition models, an object recognition frame-work based on EfficientDet and stereo attention mechanism is proposed. Firstly, the EfficientNet network which obtains the best architecture by jointly optimizing the network depth, width and input resolution is adopted as backbone network. Then the BiFPN network is used to enhance the representation capability of features, which include improvement strategies such as adding residual links, removing single-input edge nodes, and fusing weights. On this basis, the stereo attention mechanism integrates multi-dimensional information from scale, space, and channel, respectively, and further optimizes and integrates features based on the visual attention mechanism. Finally, the subsequent head networks utilize these high-quality features to enhance the performance of object classification and localization. The test result shows that, compared with the original version, the accuracy of the proposed framework is improved by 3.0%, and the performance is improved for various networks with different number of parameters and structures, which provides a valuable reference for efficient image recognition of transmission lines.