Abstract:
As modern electric 2 are combining with Information Communication Technology (ICT), big data technology would become an essential tool for power system operation and monitoring, especially for large and complicated system. Although many big data technologies have been applied in other industries in recent years, their practicality is often limited in electrical utilities since real time processing and advanced analysis are still a challenge in 2. Therefore, it is imperative for big data and advanced analysis in smart grid applications. This paper strives to investigate the application of big data technologies and its effective data analysis methods in fault diagnosing of network components. With increasingly deployment of sensors in network, a large amount of information will be produced in timely, such abundant historical data are useful for fault diagnosing of network components, such as transformer. This paper proposes a parallel K-means algorithm using the MapReduce model to classify transformer fault with large set of historical Dissolved Gas Analysis (DGA) data in transformer. In this algorithm, the map function is responsible for assigning each training sample to their nearest center, and subsequent reduce function is responsible for updating the new centers. This proposed algorithm is performed on 4-nodes Hadoop cluster by using a JAVA programming. And it has a good performance on transformer fault diagnosing, especially for dealing with large data sets. In addition, cases with different number of nodes and datasets are studied and compared to evaluate the speedup, scaleup and sizeup performance, and a relatively good performance is found in all these aspects.