Informatics and Applications
2025, Volume 19, Issue 3, pp 67-72
CLASSIFICATION OF SMALL SETS OF DATA OF LARGE DIMENSION
- A. A. Grusho
- N. A. Grusho
- M. I. Zabezhailo
- V. V. Kulchenkov
- E. E. Timonina
Abstract
The problem of classifying of data of very large dimension is considered, while only a limited set of training samples of such data is used. Under these conditions, the possibility of using cause-and-effect relationships in solving classification problems of the specified type is checked. Problem solving is based on the existence of cause-and-effect relationships of unknown causes with the observed partially determined effects of these causes in incoming new data. Training on small set of data is used. The problems are solved in conditions when the size of the data and the number of possible data properties tend to infinity. Asymptotic conditions for unambiguous classification of new data were found. In a particular case, the classification problem was investigated in the presence of random distortions of deterministic effects in the data. The conditions for the possibility of training without a teacher are formulated. The work shows the fundamental possibilities of applying cause-and-effect relationships in the tasks of medical diagnostics, identifying fraudulent schemes in the financial sector, and assessing situational awareness in cybersecurity.
[+] References (14)
- Grusho, A. A., N. A. Grusho, M. I. Zabezhailo, A. A. Zatsarinny, E. E. Timonina, and S. Ya. Shorgin. 2022. Analiz tsepochek prichinno-sledstvennykh svyazey [Cause-and- effect chain analysis]. Informatika i ee Primeneniya - Inform. Appl. 16(2):68-74. doi: 10.14357/19922264220209. EDN: HPSLTU.
- Grusho, A. A., N. A. Grusho, M. I. Zabezhailo, V. V. Kulchenkov, E. E. Timonina, and S.Ya. Shorgin. 2023. Prichinno-sledstvennye svyazi v zadachakh klassifikatsii [Causal relationships in classification problems]. Infor- matika iee Primeneniya - Inform. Appl. 17(1):43-49. doi: 10.14357/19922264230106. EDN: DTQZPK.
- Grusho, A. A., N. A. Grusho, M. I. Zabezhailo, E. E. Timonina, and S.Ya. Shorgin. 2023. Slozhnye prichinno- sledstvennye svyazi [Complex cause-and-effect relationships]. Informatika i ee Primeneniya - Inform. Appl. 17(2):84-89.doi: 10.14357/19922264230212. EDN: TGXQIW
- Grusho, A. A., N. A. Grusho, M. I. Zabezhailo,
D. V. Smirnov, and E. E. Timonina. 2023. Klassifikatsiya s pomoshch'yu prichinno-sledstvennykh svyazey [Classification by cause-and-effect relationships]. Informatika i ee Primeneniya - Inform. Appl. 17(3):71-75. doi: 10.14357/19922264230310. EDN: AKWBZD.
- Hofler, M. 2005. Causal inference based on counterfactuals. BMC Med. Res. Methodol. 5:28. 12 p. doi: 10.1186/ 1471- 2288-5-28.
- Zabezhailo, M. I., and Y Y. Trunin. 2019. On the problem of medical diagnostic evidence: Intelligent analysis of empirical data on patients in samples of limited size. Automatic Documentation Mathematical Linguistics 53:322-328. doi: 10.3103/ S0005105519060086.
- Grusho, A. A., M.I. Zabezhailo, and E. E. Timonina. 2020. O kauzal'noy reprezentativnosti obuchayushchikh
vyborok pretsedentov v zadachakh diagnosticheskogo tipa [On causal representativeness of training samples of precedents in diagnostic type tasks]. Informatika i ee Primeneniya - Inform. Appl. 14(1):80-86. doi: 10.14357/ 19922264200111. EDN: LAQCMA.
- Beneish, M. D. 1999. The detection of earnings manipulation. Financ. Anal. J. 55(5):24-36. doi: 10.2469/ faj.v55.n5.2296.
- Benford, F.1938. The law of anomalous numbers. P. Am. Philos. Soc. 78(4):551-572.
- Zhang, Ch., A. Gill, B. Liu, and M. J. Anwar. 2025. AI-based identity fraud detection: A systematic review. Cornell University. 31 p. Available at: https://arxiv. org/pdf/2501.09239v1 (accessed August 24, 2025).
- Akinjole, A., O. Shobayo, J. Popoola, O. Okoyeigbo, and
B. Ogunleye. 2024. Ensemble-based machine learning algorithm for loan default risk prediction. Mathematics 12(21):3423. 31 p. doi: 10.3390/math12213423.
- Grusho, A. A., M. I. Zabezhailo, D. V. Smirnov, and
E. E. Timonina. 2017. Model' mnozhestva informatsionnykh prostranstv v zadache poiska insaydera [The model of the set of information spaces in the problem of insider detection]. Informatika i ee Primeneniya - Inform. Appl. 11(4):65-69.doi: 10.14357/19922264170408. EDN: ZXWUOP.
- Li, Zheng. 2024. Overview of cyber security situation awareness. Applied Computational Engineering 30:149- 154. doi: 10.54254/2755-2721/30/20230089.
- Grusho, A., N. Grusho, and E. Timonina. 2020. Method of several information spaces for identification of anomalies. Intelligent distributed computing XIII. Eds. I. Kotenko,
C. Badica, V. Desnitsky, D. El Baz, andM. Ivanovic. Studies in computational intelligence ser. Cham: Springer. 868:515-520. doi: 10.1007/978-3-030-32258-8.60.
[+] About this article
Title
CLASSIFICATION OF SMALL SETS OF DATA OF LARGE DIMENSION
Journal
Informatics and Applications
2025, Volume 19, Issue 3, pp 67-72
Cover Date
2025-10-10
DOI
10.14357/19922264250308
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
classification of data of large dimension; artificial intelligence; cause-and-effect relationships
Authors
A. A. Grusho  , N. A. Grusho  , M. I. Zabezhailo  , V. V. Kulchenkov  , and E. E. Timonina
Author Affiliations
 Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
 VTB Bank, 43-1 Vorontsovskaya Str., Moscow 109147, Russian Federation
|