Informatics and Applications
2019, Volume 13, Issue 3, pp 34-40
HYBRID EXTREME GRADIENT BOOSTING MODELS TO IMPUTE THE MISSING DATA IN PRECIPITATION RECORDS
- A. K. Gorshenin
- O. P. Martynov
Abstract
The article compares the classical method of extreme gradient boosting implemented in the XGBoost (eXtreme Gradient Boosting) framework with the new modification CatBoost (Categorial Boosting), which is rarely involved in scientific researches. Some hybrid classification-regression models are proposed to improve the accuracy of imputation in missing values in real data using 14 meteorological stations in Germany. The achieved accuracy of the classification is up to 92% and the root-mean-square errors are quite moderate. The hybrid methods outperformed both simple classification and regression models in prediction accuracy. The proposed approaches can be successfully used for meteorological data analysis by machine learning methods as well as for improving the forecasting accuracy in physical models of atmospheric processes.
[+] References (17)
- Gorshenin, A. K., and V. Yu. Korolev. 2018. Opredelenie ekstremal'nosti ob"emov osadkov na osnove modifitsi- rovannogo metoda prevysheniya porogovogo znacheniya [Determining the extremes of precipitation volumes based on a modified "Peaks over Threshold"]. Informatika i ee Primeneniya - Inform. Appl. 12(4):16-24.
- Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29(5):1189-1232.
- Chen, T, andC. Guestrin. 2016. XGBoost: Ascalable tree boosting system. 22ndACMSIGKDD Conference (Interna-tional) on Knowledge Discovery and Data Mining Proceed-ings. San Francisco, CA. 785-794.
- Mustapha, I.B., and F. Saeed. 2016. Bioactive molecule prediction using extreme gradient boosting. Molecules 21(8):983.
- Xia, Y., C. Liu, Y. Li, andN. Liu. 2017. Aboosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78:225-241.
- Chatzis, S. P, V. Siakoulis, A. Petropoulos, E. Stavrou- lakis, and N. Vlachogiannakis. 2018. Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert Syst. Appl. 112:353-371.
- Zhang, D., L. Qian, B. Mao, C. Huang, B. Huang, and Y. Si. 2018. A data-driven design for fault detection of wind turbines using random forests and XGboost. IEEE Access 6:21020-21031.
- Aler, R., I. M. Galvan, J. A. Ruiz-Arias, and C. A. Guey- mard. 2017. Improvingthe separation of direct and diffuse solar radiation components using machine learning by gradient boosting. Sol. Energy 150:558-569.
- Torres-Barran, A., A. Alonso, andJ. R. Dorronsoro. 2018. Regression tree ensembles for wind energy and solar radiation prediction. Neurocomputing 326:151-160.
- Prokhorenkova, L., G. Gusev, A. Vorobev, A.V Doro- gush, and A. Gulin. 2018. CatBoost: Unbiased boosting with categorical features. Adv. Neur. In. 31:6638-6648.
- Ivanov, M. V., L. I. Levitsky, J. A. Bubis, and M. V. Gor-shkov. 2019. Scavager: A versatile postsearch validation algorithm for shotgun proteomics based on gradient boosting. Proteomics 19(3):1800280.
- Punmiya, R., and S. Choe. 2019. Energy theft detection using gradient boosting theft detector with feature boost engineering-based preprocessing. IEEE T. Smart Grid 10(2):2326-2329.
- Korner, P., R. Kronenberg, S. Genzel, and C. Bernhofer. 2018. Introducing Gradient Boosting as a universal gap filling tool for meteorological time series. Meteorol. Z. 27(5):369-376.
- Fan, J., X. Wang, L. Wu, H. Zhou, F Zhang, X. Yu, X. Lu, and Y. Xiang. 2018. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energ. Convers. Manage. 164:102-111.
- Christ, M., N. Braun, J. Neuffer, and A. W Kempa-Liehr. 2018. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - a Python package). Neurocom-puting 307:72-77.
- Huang, J., andC.X. Ling. 2005. Using AUC and accuracy in evaluating learning algorithms. IEEE T. Knowl. Data En. 17(3):299-310.
- Gorshenin, A. K., and V. Yu. Korolev. 2018. Scale mixtures of Frechet distributions as asymptotic approximations of extreme precipitation. J. Math. Sci. 234(6):886-903.
[+] About this article
Title
HYBRID EXTREME GRADIENT BOOSTING MODELS TO IMPUTE THE MISSING DATA IN PRECIPITATION RECORDS
Journal
Informatics and Applications
2019, Volume 13, Issue 3, pp 34-40
Cover Date
2019-09-30
DOI
10.14357/19922264190306
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
data imputation; precipitation; classification; regression; gradient boosting; XGBoost; CatBoost
Authors
A. K. Gorshenin , and O. P. Martynov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russian Federation
|