孙瑜, 任高明. Eclat算法下电力大数据并行关联规则增量挖掘方法[J]. 电力信息与通信技术, 2025, 1(1): 83-88. DOI: 10.16543/j.2095-641x.electric.power.ict.2025.01.11
引用本文: 孙瑜, 任高明. Eclat算法下电力大数据并行关联规则增量挖掘方法[J]. 电力信息与通信技术, 2025, 1(1): 83-88. DOI: 10.16543/j.2095-641x.electric.power.ict.2025.01.11
SUN Yu, REN Gaoming. An Incremental Mining Method for Parallel Association Rules in Power Big Data Using Eclat Algorithm[J]. Electric Power Information and Communication Technology, 2025, 1(1): 83-88. DOI: 10.16543/j.2095-641x.electric.power.ict.2025.01.11
Citation: SUN Yu, REN Gaoming. An Incremental Mining Method for Parallel Association Rules in Power Big Data Using Eclat Algorithm[J]. Electric Power Information and Communication Technology, 2025, 1(1): 83-88. DOI: 10.16543/j.2095-641x.electric.power.ict.2025.01.11

Eclat算法下电力大数据并行关联规则增量挖掘方法

An Incremental Mining Method for Parallel Association Rules in Power Big Data Using Eclat Algorithm

  • 摘要: 电力大数据具有时变性的特点,如果挖掘方法无法实时处理新增数据,及时发现数据之间更新的关联规则,可能导致挖掘结果的滞后和不准确,降低挖掘的准确度。对此,文章提出Eclat算法下电力大数据并行关联规则增量挖掘方法。采用相似项合并策略消除由数据冗余和噪声引起的误导性信息,提高电力大数据的质量;通过最小哈希原理优化Eclat算法,建立MinHash矩阵估计原始数据集候选项目集,对其实施剪枝,减少数据比较和存储的复杂性,提高挖掘的效率。利用增量更新原则获取更新后候选项目集,并结合Hash Eclat算法快速更新已有的关联规则,实现大数据并行关联规则的增量挖掘,提升关联规则挖掘的准确度。实验结果表明,利用该方法开展关联规则挖掘时,I/O占用量始终在200 kB以下,CPU占用量低于20%,漏检数量和误报数量最低为0,网络通信量最低可达到268 MB,ROC曲线下方面积较大,与当前挖掘方法相比,具有较高的挖掘准确度和较好的挖掘性能。

     

    Abstract: Power big data has the characteristic of time-varying. If mining methods cannot process new data in real time and update association rules between data in a timely manner, it may lead to delayed and inaccurate mining results, reducing the accuracy of mining. To solve this problem, an incremental mining method is proposed for parallel association rules in power big data using the Eclat algorithm. A similarity merging strategy is adopted to eliminate misleading information caused by data redundancy and noise, and improve the quality of power big data. By optimizing the Eclat algorithm using the minimum hash principle, a MinHash matrix is established to estimate the candidate itemsets in the original dataset. Pruning is performed on the candidate itemsets to reduce the complexity of data comparison and storage, and improve the efficiency of mining. The incremental update principle is used to obtain the updated candidate project set, and combined with the Hash Eclat algorithm to quickly update existing association rules, achieve incremental mining of parallel association rules in big data and improve the accuracy of association rule mining. The experimental results show that when using this method for association rule mining, the I/O usage is always below 200kB, the CPU usage is less than 20%, the number of missed and false positives is the lowest at 0, the network communication volume can reach as low as 268MB, and the area under the ROC curve is relatively large. Compared with current mining methods, it has higher mining accuracy and better mining performance.

     

/

返回文章
返回