Systems and Means of Informatics
2014, Volume 24, Issue 1, pp 224-243
- M.M. Gershkovich
- T.K. Birukova
An approach for identification of informational objects (IO) in
automatic informational systems employed for data collection, storage, and
processing is presented. Information systems consist ofmultiple nodes and acquire
data from multiple sources. In majority of cases, a data array of informational
systems is presented as continuously filled event's diary. Each event's record
includes characteristics of the event's participant | IO | and of the event's
conditions. In order to solve analytical problems related to IO, one should
identify IO, i. e., define the array of IOs that are, with certain probability, the
same entity. The paper defines typical IO identification tasks for elaboration
of large-scale informational systems: IO fusion and IO clustering | forming
an aggregate of IOs similar with respect to certain criteria. The identification
task is closely connected to the task of identification of links between IOs, as
the probability of IO's identity is higher if each IO is associated with another
object. The methods for solving these tasks are presented, special features of
IO identification in the flow of events are studied, and the correlation search
method for detection of associations between IOs is described. The method
for comparison of proper names considering probable distortions (phonetic and
transcriptional) and misprints is presented. The efficacy of simultaneous Cyrillic
and Latin first name { second name blocks application for personal identification
is substantiated and the methods for translation from Cyrillic to Latin and vice
versa are presented.
Key words
identification of informational objects; identification of objects;
correlation search; search for associations; identity of objects; fusion of informational objects; fusion of objects; text attributes; data distortions; phonetic
distortions; transcriptional errors; Latin to Cyrillic transcription; Cyrillic to
Latin transcription; Metaphone; Levenstein's distance; spread systems; area-
spread systems; hierarchical systems; flow of events
M.M. Gershkovich  and T.K. Birukova
Author Affiliations
 Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str.,
Moscow 119333, Russian Federation