Abstract:
For the petroleum industry in the big data period, it is necessary to fully exploit the great potential value of big data in the petroleum industry. Although data mining has achieved remarkable results in many industries, its application in the field of hydrocarbon exploration and development is still in its initial stage, which mainly lies on the particularity of the data and its specific applications in hydrocarbon exploration and development. The common algorithms in data mining can be divided into regression, classification, clustering, estimation, prediction, association analysis and so on. Among them, regression and classification are the most mature and most widely used algorithms. However, for specific research objects as well as different research questions and data resources, different regression and classification algorithms have their own applicability, thus it is required to optimize the appropriate algorithm for data sets aiming at specific problems. Taking the oil test data of Tahe oilfield as an example, and formation factor and reservoir classification as the mining objects, the applicability of common regression and classification algorithms is analyzed in detail. The results show that for common petroleum industry data and study objects, the optimal regression algorithm is the back propagation neural network (BPNN), followed by support vector machine regression (
R-SVM) and multivariate regression analysis (MRA); the optimal classification algorithm is the support vector machine classification (
C-SVM), followed by Bayesian stepwise discrimination (BAYSD); MRA and BAYSD can also be used for data dimensionality reduction, and the latter is better; R-type clustering analysis (RCA) can also be used for data dimensionality reduction, while Q cluster analysis (QCA) can be adopted for sample reduction; in the research of specific data mining applications, the algorithm must be optimized according to specific data set.