Abstract:
Pointer-type meter reading is a key task in industrial digitalization. Currently, pointer-type meter reading mainly relies on traditional recognition algorithms such as target detection and key point positioning, which have bottlenecks such as low generalization and strong data dependence. This paper simulates the human knowledge reading process through a large visual language model and proposes a general pointer meter reading framework: (1) In order to break through the bottleneck of data dependence, a multimodal data synthesis pipeline for reading in industrial scenarios is constructed, which can automatically generate more than 20, 000 question-answer pairs; (2) In order to overcome the bottleneck of "hallucination" of large models, DeepSeek-R1 is used to simulate human knowledge reading, decouple meter semantic understanding and reading reasoning processes, and the average reference error is reduced by 10% compared with the basic model Qwen2.5-VL; (3) In order to improve generalization, a tolerance adaptive reinforcement learning optimization method based on generalized strategy optimization is designed to convert absolute accuracy constraints into learnable tolerance intervals to enhance out-of-distribution data (OOD) generalization. In the OOD test, the reading error of this method is reduced to 2%. Experiments show that the average reference error of the proposed framework in this paper is 1.2% in the simulated industrial meter test set and 3.16% in the public real meter test set, outperforming the advanced large models such as QWen2.5-VL-72B and GPT4o. The result of this paper provides a reference for the application of visual language large models in refined visual understanding and reasoning computing tasks.