Informatics and Applications
2016, Volume 10, Issue 3, pp 32-40
SIGNIFICANCE TESTS OF FEATURE SELECTION FOR CLASSIFICATION
Abstract
The paper considers the problem of feature selection for classification and issues related to the assessment of the quality of the solutions. Among the different methods of feature selection, attention is paid to sequential procedures; the probability of the correct classification is used to measure the quality of the classification. To evaluate this indicator, it is proposed to use cross-validation and the bootstrap method. At the same time, to investigate the set of sample values of probability of the correct classification, it is suggested to use comparative analysis of confidence intervals and the test for homogeneity of binomial proportions. While constructing Bayesian classifier as the data model mixture of normal distributions is adopted, the model parameters are estimated by the expectation-maximization algorithm. As an experiment, the paper considers the problem of well-thoughtout choice of classification characteristics when predicting the type of urinary stones in urology. It is demonstrated that the set of used features can be reduced not only without losing the quality of decisions, but also with increase of probability of correct prediction of the stone type.
[+] References (12)
- Webb, A. R., and K. D. Copsey. 2011. Statistical pattern recognition. 3rded. Chichester, U.K.: John Wiley & Sons. 616 p.
- Liu, H, and L. Yu. 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17:491-502.
- Saeys, Y., I. Inza, and P. Larrannaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507-2517.
- Yu, L., and H. Liu. 2004. Efficient feature selection via analysis of relevance and redundancy. J. Machine Learning Res. 5:1205-1224.
- Stracuzzi, D. J. 2007. Randomized feature selection. Computational methods of feature selection. Boca Raton, FL: Chapman and Hall/CRC. 41-62.
- Dasgupta, A., andT. Zhang. 2006. Binomial and multinomial parameters, inference on. Encyclopedia of statistical sciences. New York, NY: John Wiley & Sons. 501-519.
- Hall, P. 1982. Improving the normal approximation when constructing one-sided confidence intervals for binomial or Poisson parameters. Biometrika 69:647-652.
- Upton, G. J. G. 1982. Acomparison of alternative tests for the 2x2 comparative trial. J. Roy. Stat. Soc.A 145:86-105.
- Lehmann, E. L., and J. P Romano. 2005. Testing statistical hypotheses. 3rd ed. New York, NY: Springer. 784 p.
- Krivenko, M. P. 2011. Zadachi vyborochnogo kontrolya pri dosmotre lits, bagazha i transporta [Tasks of sampling during the inspection of individuals, baggage and transport]. Obozrenie prikladnoy i promyshlennoy matematiki
[Review of applied and industrial mathematics] 18:125- 126.
- Potthoff, R. F 2006. Homogeneity, Potthoff-Whittighill tests of. Encyclopedia of statistical sciences. New York, NY John Wiley & Sons. 3217-3220.
- Klein, M., and P. Linton. 2013. On a comparison of tests of homogeneity of binomial proportions. Center for Statistical Research & Methodology Research and Methodology Directorate U.S. Census Bureau Washington. Available at: https: / / www.census.gov/srd / papers/pdf/rrs2013-
3. pdf (accessed April 25, 2016).
[+] About this article
Title
SIGNIFICANCE TESTS OF FEATURE SELECTION FOR CLASSIFICATION
Journal
Informatics and Applications
2016, Volume 10, Issue 3, pp 32-40
Cover Date
2016-08-30
DOI
10.14357/19922264160305
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
feature selection; sequential forward and backward selections; Bayes classification; test of homogeneity of binomial proportions; prediction of stone types in urology
Authors
M. P. Krivenko
Author Affiliations
Institute of Informatics Problems, Federal Research Center “Computer Sciences and Control” of the Russian
Academy of Sciences, 44-2 Vavilov Str.,Moscow 119333, Russian Federation
|