Informatics and Applications

2021, Volume 15, Issue 4, pp 79-86

STATISTICS AND CLUSTERS FOR DETECTION OF ANOMALOUS INSERTIONS IN BIG DATA ENVIRONMENT

  • A. A. Grusho
  • N. A. Grusho
  • M. I. Zabezhailo
  • D. V. Smirnov
  • E. E. Timonina
  • S. Ya. Shorgin

Abstract

The paper builds algorithms for reducing the level of "false alarms" when searching for anomalies in complex heterogeneous sequences of objects (Big Data). Traditionally, in mathematical statistics, such a decrease is achieved by minimizing the error of "false alarms." However, in the problems of detecting anomalies (rare intrusions of anomalous data), this approach leads to an increase in the probability of losing the required anomalies. In this paper, in order not to lose the required anomalies, on the contrary, in criteria designed for the least complexity of calculations, it is proposed to make a large error of the appearance of "false alarms" but use the fact that the number of objects allocated by such criteria is much smaller than the number of original objects in Big Data. The selected objects can then be grouped into a single cluster and additional information related to the objects in the cluster can be used to identify the required anomalies. The sense of these actions is that more difficult-to-compute characteristics of objects for dropping out "false alarms" will not require large computational resources on a smaller cluster of objects relative to the original data. It is shown that when certain conditions are satisfied, the order of using additional information does not affect the result of its use when filtering "false alarms." The results of the filtering algorithm in the sequence of objects are generalized to filtering "false alarms" in the form of causal schemes in the initial data. Known schemes show how "false alarms" can be filtered identifying only fragments of schemes.

[+] References (15)

[+] About this article