Systems and Means of Informatics
2022, Volume 32, Issue 4, pp 14-20
DETECTION OF DISTRIBUTION DRIFT
- A. A. Grusho
- N. A. Grusho
- M. I. Zabezhailo
- D. V. Smirnov
- E. E. Tmonina
- S. Ya. Shorgin
Abstract
Changing the properties of the data being collected is often referred to as data drift (various options for shifting the characteristics of the data).
The existence of drift in artificial intelligence system training data often leads to a decrease in the efficiency of machine learning and erroneous solutions of artificial intelligence systems built on these data. In this regard, the problems of detecting drift in machine learning data, the moment of drift formation, and the consequences of changes in training data become relevant. The work proposes a method for detecting the drift of a probability distribution in an arbitrary metric space of large dimension. The method relies on the difference between unknown probability distributions in different regions of the original space in the event of drift. A drift model consisting of two different probability distributions is considered. Using the balls in metric space as the basis of the method allows one to create an efficient algorithm for calculating the ownership of data points to one of the balls associated with different distributions of the drift model.
This circumstance seems to be essential for revealing the drift of a distribution in a high-dimensional space.
[+] References (7)
- Lu, J., A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang. 2018. Learning under concept drift: A review. IEEE T. Knowl. Data En. 12:2346-2363. doi: 10.1109/TKdE. 2018.2876857.
- Schlimmer, J. C., and R. H. Granger, Jr. 1986. Incremental learning from noisy data. Mach. Learn. 1 (3):317-354.
- Gama, J., I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv. 46(4):1-37.
- Zliobaite, I., and J. Hollmen. 2014. Optimizing regression models for data streams with missing values. Mach. Learn. 99(1):47-73.
- Lu, N., J. Lu, G. Zhang, and R. Lopez de Mantaras. 2016. A concept drift-tolerant case-base editing technique. Artif. Intell. 230:108-133.
- Losing, V., B. Hammer, and H. Wersing. 2016. KNN classifier with self-adjusting memory for heterogeneous concept drift. 16th Conference (International) on Data Mining Proceedings. IEEE. 291-300.
- Grusho, A. A. 1996. Statisticheskie kriterii znachimosti dlya klasternykh struktur, osnovannye na poparnykh merakh blizosti [Statistical significance criteria for cluster structures based on pairwise proximity measures]. Obozrenie promyshlennoy i prikladnoy matematiki [Surveys in Applied and Industrial Mathematics] 3(1):43-46.
[+] About this article
Title
DETECTION OF DISTRIBUTION DRIFT
Journal
Systems and Means of Informatics
Volume 32, Issue 4, pp 14-20
Cover Date
2022-30-11
DOI
10.14357/08696527220402
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
distribution drift; mathematical statistics; efficiently calculated algorithm
Authors
A. A. Grusho , N. A. Grusho , M. I. Zabezhailo , D. V. Smirnov , E. E. Tmonina ,
and S. Ya. Shorgin
Author Affiliations
Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
Sberbank of Russia, 19 Vavilov Str., Moscow 117999, Russian Federation
|