Abstract:
With the rapid development of the Internet of things in power distribution, massive heterogeneous data are constantly generated from the production, transmission and consumption end. These data have the characteristics of fast update speed, poor quality, low value density and strong time sequence. How to extract high quality valuable data from these massive data and reduce data redundancy requires effective data cleaning and fusion methods. Therefore, a data cleaning and fusion method based on the similarity measurement of time series is proposed. This method uses the symbol aggregate approximation (SAX), the Euclidean algorithm, the similarity weighted similar sequence to complete the data cleaning, and the multiple heterogeneous data fusion algorithm to complete the data fusion. The 1440 points load data is selected for the experiment. The results show that the method can detect the abnormal data, fill in the vacant data, reduce the data redundancy, and integrate the heterogeneous data. The processed data has high precision, low computational complexity, and improves the data quality, providing reliable basic data for the application of the distribution IoT.