Abstract:
Power big data has the characteristic of time-varying. If mining methods cannot process new data in real time and update association rules between data in a timely manner, it may lead to delayed and inaccurate mining results, reducing the accuracy of mining. To solve this problem, an incremental mining method is proposed for parallel association rules in power big data using the Eclat algorithm. A similarity merging strategy is adopted to eliminate misleading information caused by data redundancy and noise, and improve the quality of power big data. By optimizing the Eclat algorithm using the minimum hash principle, a MinHash matrix is established to estimate the candidate itemsets in the original dataset. Pruning is performed on the candidate itemsets to reduce the complexity of data comparison and storage, and improve the efficiency of mining. The incremental update principle is used to obtain the updated candidate project set, and combined with the Hash Eclat algorithm to quickly update existing association rules, achieve incremental mining of parallel association rules in big data and improve the accuracy of association rule mining. The experimental results show that when using this method for association rule mining, the I/O usage is always below 200kB, the CPU usage is less than 20%, the number of missed and false positives is the lowest at 0, the network communication volume can reach as low as 268MB, and the area under the ROC curve is relatively large. Compared with current mining methods, it has higher mining accuracy and better mining performance.