Informatics and Applications
2020, Volume 14, Issue 1, pp 63-70
ON METHODS FOR IMPROVING THE ACCURACY OF MULTICLASS CLASSIFICATION ON IMBALANCED DATA
- L. A. Sevastianov
- E. Yu. Shchetinin
Abstract
This paper studies methods to overcome the imbalance of classes in order to improve the quality of classification with accuracy higher than the direct use of classification algorithms to unbalanced data. The scheme to improve the accuracy of classification is proposed, consisting in the use of a combination of classification algorithms and methods ofselection offeatures such as RFE (Recursive Feature Elimination), Random Forest, and Boruta with the preliminary use of balancing classes by random sampling methods, SMOTE (Synthetic Minority Oversamplimg TEchnique) and ADASYN (ADAptive SYNthetic sampling). By the example of data on skin diseases, computer experiments were conducted which showed that the use of sampling algorithms to eliminate the imbalance of classes as well as the selection of the most informative features significantly increases the accuracy of the classification results. The most effective classification accuracy was the Random Forest algorithm for sampling data using the ADASYN algorithm.
[+] References (11)
- Patterson, J., and À. Gibson. 2017. Deep learning: Apractitioner's approach. O'Reilly Media. 532 p.
- Japkowicz, N., and S. Stephen. 2002. The class imbalance problem: A systematic study. Intell. Data Anal. 6(5):429-449. doi: 10.3233/IDA-2002-6504.
- He, H., and A. Garcia. 2009. Learning from imbalanced
data. IEEE T. Knowl. Data En. 21(9):1263–1284. doi:
10.1109/TKDE.2008.239
- Chawla, N.V., K.W. Bowyer, L. O.Hall, and W.P. Kegelmeyer. 2002. SMOTE: Synthetic minority
over-sampling technique. J. Artif. Intell. Res. 16:321–357.
- Lin, X., F Yang, and L. Zhou. 2012. A support vector machine recursive feature elimination feature selection method based on artificial contrast variables and mutual information. J. Chromatogr. B 10:149-155. doi: 10.1016/ j.jchromb.2012.05.020.
- Han, H., W. Wen-Yuan, and M. Bing-Huan. 2005. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. Advances in intelligent computing. Eds. De-Shuang Huang, Xiao-Ping Zhang, and Guang-Bin Huang. Lecture notes in computer science bookser. Springer. 3644:878-887. http://dx.doi.org/ 10.1007/11538059-91.
- He, H., Ya. Bai, A. Garcia, and Sh. Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. IEEE Joint Conference (International) on Neural Networks (IEEE World Congress on Computational Intelli-gence). China. 1322-1328.
- Murphy, P.M., and D.W Aha. 1998. UCI repository of machine learning databases. Irvine, CA: University of California-Irvine, Department of Information and Computer Science. Available at: https://www.ics. uci.edu/mlearn/MLRepository.html (accessed December 27, 2019).
- Dermatology-article. Available at: https://github.com/ riviera2015/Dermatology-article (accessed December 27, 2019).
- Tuv, E., A. Borisov, G. Runger, and K. Torkkola. 2009. Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10:1341- 1366.
- Kursa, M., and W Rudnicki. 2010. Feature selection with the Boruta package. J. Stat. Softw. 36(11):1-13. doi: 10.18637/jss.v036.i11.
[+] About this article
Title
ON METHODS FOR IMPROVING THE ACCURACY OF MULTICLASS CLASSIFICATION ON IMBALANCED DATA
Journal
Informatics and Applications
2020, Volume 14, Issue 1, pp 63-70
Cover Date
2020-03-30
DOI
10.14357/19922264200109
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
imbalanced data; classification; sampling; random forest; ADASYN; SMOTE
Authors
L. A. Sevastianov and E. Yu. Shchetinin
Author Affiliations
Peoples' Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow 117198, Russian Federation
Financial University under the Government of the Russian Federation, 49 Leningradsky Prospekt, Moscow 125993, Russian Federation
|