基于Hadoop的广域测量系统数据处理
Data Processing of Hadoop-based Wide Area Measurement System
-
摘要: 为解决目前广域测量系统(WAMS)海量数据处理中存在的数据冗余、处理效率低等问题,设计并实现了一个基于Hadoop的WAMS数据处理云计算平台。首先,给出了平台的体系结构。其次,设计了基于Hadoop分布式文件系统(HDFS)存储的WAMS海量数据加载方法和利用MapReduce模型实现多个文件数据的并行抽取、转换和加载(ETL)操作流程。提出了结合MapReduce的MPApriori数据挖掘算法,用于高效地挖掘出连锁故障时各站点之间的相互影响。最后,通过对区域电网WAMS实际数据进行处理,验证了Hadoop处理海量数据的高效性。所述平台适用于高性能局域网络连接的计算机集群对海量电网数据进行文件数据处理。Abstract: To solve the wide area measurement system(WAMS) mass data processing problems such as data redundancy and low processing efficiency,a cloud computing platform based on Hadoop is designed and implemented.The structure of this platform is described first.Then,a WAMS mass data loading method is designed based on Hadoop distributed file system(HDFS) and parallel data extraction-transformation-loading(ETL) for multiple file processing by using MapReduce.MPApriori data mining algorithm combined with MapReduce is proposed to discover the interplay of power sites when cascading failures occurred.Finally,through the regional network WAMS actual data processing,the effectiveness of mass data processing on Hadoop is proven.This platform is suitable for mass power grid files data mining by high performance local area network connection of a computer cluster.