徐宁, 王艳芹, 董祯, 王勇. 基于Apache Spark的配电网大数据预处理技术研究[J]. 华北电力大学学报(自然科学版), 2021, 48(2): 40-46,54.
引用本文: 徐宁, 王艳芹, 董祯, 王勇. 基于Apache Spark的配电网大数据预处理技术研究[J]. 华北电力大学学报(自然科学版), 2021, 48(2): 40-46,54.
XU Ning, WANG Yanqin, DONG Zhen, WANG Yong. Research on Distribution System Big Data Preprocessing Technology Based on Apache Spark[J]. Journal of North China Electric Power University, 2021, 48(2): 40-46,54.
Citation: XU Ning, WANG Yanqin, DONG Zhen, WANG Yong. Research on Distribution System Big Data Preprocessing Technology Based on Apache Spark[J]. Journal of North China Electric Power University, 2021, 48(2): 40-46,54.

基于Apache Spark的配电网大数据预处理技术研究

Research on Distribution System Big Data Preprocessing Technology Based on Apache Spark

  • 摘要: 随着配电网采集的数据规模日益增大,如何高效地预处理配电网数据成为目前配电网数据分析面临的重要问题之一。考虑到配电网大数据的复杂性,提出了基于Apache Spark的大规模数据并行预处理的方法。首先,为了更有效地处理配电网大数据,以Spark为计算引擎搭建了大数据并行计算平台;接着,分析了目前配电网大数据面临的一些普遍性问题,提出了针对这些问题的数据治理方案;然后,结合Spark计算引擎,介绍了配电网大数据预处理的具体流程;最后通过实验验证了数据预处理对配电网数据预测的精确度提升,以及分布式计算平台在数据预处理方面的速度优势。

     

    Abstract: Given the complexity and growing collection scale of the distribution system big data,it is urgent to figure out how to effectively preprocess data in distribution system. Considering that,this paper proposes a parallel computing technology for large-scale datasets based on Apache Spark. Firstly,the Apache Spark-based big data parallel computing platform is set up to improve preprocessing efficiently. Then,we analyze some common problems in the present distribution system big data and put forward data governance programmes accordingly. After that,we introduce the specific processes of distribution network big data preprocessing combined with the Spark computing engine. Finally,the experiments verify that the proposed data preprocessing method improves the distribution system in data prediction accuracy and that the distributed computing platform possesses high speed in data preprocessing.

     

/

返回文章
返回