Informatics and Applications
2017, Volume 11, Issue 3, pp 27-33
SUPERVISED LEARNING CLASSIFICATION OF INCOMPLETE CLINICAL DATA
Abstract
The article examines the effectiveness of classification methods for incomplete clinical data. Training Bayesian classifier is carried out by the maximum likelihood method for the model of a mixture of normal distributions. Rigorous derivation of formulas ensuring the realization of the steps of the EM algorithm allowed correctly applying the iterative process of obtaining estimates of the parameters of the mixture. For incomplete data, methods for selecting initial values and correcting degenerate covariance matrices for the elements of the mixture are proposed. The experimental part of the work consisted in analyzing the dependence of the quality of classification on the number of missing individual values, using data on enzymes obtained for patients with liver diseases. The real data treatment has demonstrated almost identical classification errors when applying simple and complex methods of processing of missing values in the case of low number of randomly missing individual values.
[+] References (16)
- Rubin,D.B. 1976. Inference and missing data. Biometrika 63:581—592.
- Rubin, D. B. 1987. Multiple imputation for nonresponse in surveys. New York, NY: John Wiley & Sons. 256 p.
- Little, R. J.A, and D. B. Rubin. 2002. Statistical analysis with missing data. 2nd ed. New York, NY: John Wiley & Sons. 408 p.
- Mallinckrodt, C. H., P. W. Lane, D. Schnell, Y. Peng, and J. Mancuso. 2008. Recommendation for the primary analysis of continuous endpoints in longitudinal clinical trials. Druglnf. J. 42:303—319.
- Molenberghs, G., andM. G. Kenward. 2007. Missingdata in clinical studies. West Sussex: John Wiley & Sons. 526 p.
- Myers, W R.2000. Handling missing data in clinical trials: An overview. Drug Inf. J. 34:525—533.
- Andridge, R. R., and R. J. A. Little. 2010. A review of hot deck imputation for survey non-response. Int. Stat. Rev. 78(1):40—64.
- Myers, T. A. 2011. Goodbye, listwise deletion: presenting hot deck imputation as an easy and effective tool for handling missing data. Commun. Meth. Measures 5(4):297— 310.
- Little, R. J. A. 1995. Modeling the drop-out mechanism in repeated-measures studies. J. Am. Stat. Assoc. 90(431):1112—1121.
- Dempster, A. P., N.M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via EM algorithm. J. Roy. Stat. Soc. BMet. 39(1):1—38.
- Alber, A. 1987. Multivariate interpretation of clinical laboratory data. New York, NY: CRC Press. 386 p.
- Krivenko, M. P. 2016. Statisticheskiemetodypredstavleniya i statisticheskoy predvaritel’noy obrabotki referensnykh znacheniy [Statistical methods for representation and pre-treatment of reference values]. Moscow: FRC CSC RAS. 160 p.
- Ghahramani, Z., and M. I. Jordan. 1995. Learning from incomplete data. MIT AI. A.I. Memo No. 1509. C.B.C.L. Paper No. 108. Available at: https:// dspace.mit.edu/handle/1721.1/7202 (accessed June 14, 2017).
- Hunt, L., and M. Jorgensen. 2003. Mixture model clustering for mixed data with missing information. Comput. Stat. Data An. 41:429—440.
- Delalleau, O., A. Courville, and Y Bengio. 2012. Efficient EM training of Gaussian mixtures with missing data. Available at: https://arxiv.org/abs/1209.0521 (accessed June 14, 2017).
- Eirola, E., A. Lendasse, V. Vandewalle, and C. Biernacki. 2014. Mixture of Gaussians for distance estimation with missingdata. Neurocomputing 131:32—42.
[+] About this article
Title
SUPERVISED LEARNING CLASSIFICATION OF INCOMPLETE CLINICAL DATA
Journal
Informatics and Applications
2017, Volume 11, Issue 3, pp 27-33
Cover Date
2017-09-30
DOI
10.14357/19922264170303
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
missing data; EM algorithm; mixtures of normal distributions
Authors
M. P. Krivenko
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|