Informatics and Applications
2023, Volume 17, Issue 2, pp 50-56
CRITERIA FOR CHOOSING THE FACTORIZATION MODEL DIMENSIONALITY
Abstract
The paper is devoted to the choice ofmodel dimension ofmatrix factorization in the presence ofmissing elements. The problem of estimating the parameters of the adopted data model is solved by multidimensional optimization. Estimating the value of reduced dimensionality is a typical example of the problem of choosing a model when an alternative arises during data analysis and the choice means either finding out the preferences of individual options or highlighting the "best" representative. Typically, applied selection criteria are based on likelihood function which requires probabilistic assumptions about the data. But when evaluating the parameters of the factor model under consideration, they are not set and it is impractical to introduce them, so as not to violate the commonality of the formulated task of reducing dimensionality. Therefore, an attempt was made to turn to the idea of reusing the available data for the statistical output. None of the existing approaches (bootstrap, folding knife, rechecks, as well as permutation tests) is suitable; so, an original method for generating new data by additional omissions ofelements ofthe original matrix was proposed. To process the formed samples, it is suggested to use a combination of the model of a mixture of normal distributions in conjunction with nuclear smoothing.
The proposed solutions make it possible to correctly carry out the procedure for justifying the dimensionality ofthe adopted factorization model. The exposition is illustrated by an example ofsynthetic data processing.
[+] References (7)
- Chen, P. 2008. Optimization algorithms on subspaces: Revisiting missing data problem in low-rank matrix. Int. J.Comput. Vision 80(1):125-142.doi: 10.1007/s11263-008- 0135-7.
- Krivenko, M. P. 2022. Vybor modeli pri faktorizatsii matritsy dannykh s propuskami [Model selection for matrix factorization with missing components]. Informatika i ee Primeneniya - Inform. Appl. 16(3):52-58. doi: 10.14357/ 19922264220307.
- Krivenko, M.P. 2023. Effektivnye vychisleniya pri faktorizatsii matrichnykh dannykh s propuskami [Efficient computations in a matrix factorization with missing components]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 33(1):78-89. doi: 10.14357/ 08696527230108.
- Poland, W B., and R. D. Shachter. 1994. Three approaches to probability model selection. Uncertainty in artificial in-telligence. Seattle, WA: Morgan Kaufmann. 478-483. doi: 10.1016/B978-1-55860-332-5.50065-1.
- Chernick, M. R. 2012. Resampling methods. WIREs Data Min. Knowl. 2(3):255-262. doi: 10.1002/widm.1054.
- Fan, C. T., M. E. Muller, and I. Rezucha. 1962. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. J. Am. Stat.Assoc. 57(298):387-402. doi: 10.1080/01621459.1962. 10480667.
- Wand, M. P. 1992. Error analysis for general multivariate kernel estimators. J. Nonparametr. Stat. 2(1):1-15. doi: 10.1080/10485259208832538.
[+] About this article
Title
CRITERIA FOR CHOOSING THE FACTORIZATION MODEL DIMENSIONALITY
Journal
Informatics and Applications
2023, Volume 17, Issue 2, pp 50-56
Cover Date
2023-07-10
DOI
10.14357/19922264230207
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
lower rank matrix approximation; missing data; criteria for model selection; resampling methods; kernel smoothing
Authors
M. P. Krivenko
Author Affiliations
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|