Systems and Means of Informatics

2022, Volume 32, Issue 1, pp 160-167

SEARCH OF ANOMALIES IN BIG DATA

  • A. A. Grusho
  • N. A. Grusho
  • M. I. Zabezhailo
  • D. V. Smirnov
  • E. E. Timonina
  • S. Ya. Shorgin

Abstract

The problem of a sufficient amount of the information for identifying the search object in the big data is that the search method may, under noise conditions, skip the searched object or, conversely, point to objects that accidentally possess the features of the present searched object. The paper discusses the simple approach to estimating the solvability of the problem of searching for the required information in big data in weak assumptions about the informativity of the identification features of search objects. In the simplest case, big data consist of a set of objects, each of which is described by a set of parameters. Each parameter definition area is its own information space. Parameter values help identify the searched object and filter false objects. If there are few parameters, then unambiguous identification of the desired object is possible in stronger restrictions on the volume of big data. Since the possibility of unambiguously identifying the desired object is not known in advance, it is necessary, at least approximately, to evaluate the restrictions on the amount of big data in which it is possible to unambiguously identify the desired information.
For such estimates, it is proposed to use the limit theorems of the probability theory in the series scheme.

[+] References (12)

[+] About this article