Systems and Means of Informatics
2014, Volume 24, Issue 1, pp 224-243
THE TASKS OF IDENTIFICATION OF INFORMATIONAL OBJECTS
IN AREA-SPREAD DATA ARRAYS
- M.M. Gershkovich
- T.K. Birukova
Abstract
An approach for identification of informational objects (IO) in
automatic informational systems employed for data collection, storage, and
processing is presented. Information systems consist ofmultiple nodes and acquire
data from multiple sources. In majority of cases, a data array of informational
systems is presented as continuously filled event's diary. Each event's record
includes characteristics of the event's participant | IO | and of the event's
conditions. In order to solve analytical problems related to IO, one should
identify IO, i. e., define the array of IOs that are, with certain probability, the
same entity. The paper defines typical IO identification tasks for elaboration
of large-scale informational systems: IO fusion and IO clustering | forming
an aggregate of IOs similar with respect to certain criteria. The identification
task is closely connected to the task of identification of links between IOs, as
the probability of IO's identity is higher if each IO is associated with another
object. The methods for solving these tasks are presented, special features of
IO identification in the flow of events are studied, and the correlation search
method for detection of associations between IOs is described. The method
for comparison of proper names considering probable distortions (phonetic and
transcriptional) and misprints is presented. The efficacy of simultaneous Cyrillic
and Latin first name { second name blocks application for personal identification
is substantiated and the methods for translation from Cyrillic to Latin and vice
versa are presented.
[+] References (12)
- Zatsarinny, A.A., Y. S. Ionenkov, and S.V. Kozlov. 2010. Nekotorye voprosy
proektirovaniya informatsionno-telekommunikatsionnykh sistem [Some problems of
information-telecommunication systems design]. Ed. A. Zatsarinny. Moscow: IPI
RAN. 218 p.
- Sokolov, I.A., A.V. Polyanskiy, E.V. Kiselev, I. N. Sinitsyn, and A. I. Temnov. 2010.
Problemy postroeniya informatsionno-telekommunikatsionnykh sistem integrirovannogo
tipa [Problems of integrated type information-telecommunication systems design].
Moscow: IPI RAN. 218 p.
- Gershkovich, M.M., T.K. Biryukova, and V. I. Sinitsyn. 2012. Programmno-
tekhnicheskie resheniya po organizatsii zashchishchennogo informatsionnogo obmena
v mnogourovnevykh territorial'no-raspredelennykh sistemakh [Program-technical solutions for organization of informational exchange within multilevel area distributed
systems]. Sbornik dokladov 3-go Mezhvedomstvennogo nauchno-prakticheskogo seminara "Sistemy i Sredstva Zashchity Informatsii" [3rd Interdepartmental Scientific-
Technical Seminar "Systems and Means of Data Protecrion" Proceedings]. Penza: Izd-vo
OOO Nauchno-tekhnicheskoe predpriyatie "Kriptosoft." No. 100-dsp. 90-93.
- Ushmaev, O. S. 2008. Servisno-orientirovannyy podkhod k razrabotke mul'tibiometricheskikh tekhnologiy [Service-oriented approach to multimodal biometrics designing].
Informatika i ee Primeneniya| Inform. Appl. 2(3):41-53.
- Gershkovich, M.M., T.K. Biryukova, and V. I. Sinitsyn. 2012. Problemy identifikatsii i raspoznavaniya informatsionnykh ob"ektov pri sozdanii raspredelennykh
informatsionno-telekommunikatsionnykh sistem [Problems of identification and recognition of information's objects in development of area distributed information-
telecommunication systems]. Optiko-elektronnye pribory i ustroystva v sistemakh
raspoznavaniya obrazov, obrabotki izobrazheniy i simvol'noy informatsii. Raspoznavanie 1012: Sbornik materialov X Mezhdunarodnoy Nauchno-Tehnicheskoy Konferentsii [Optical-Electronic Devices in Image Recognition, Image and Symbol Information
Processing Systems, Recognition, 2012: X Scientific-Technical Conference (International) Proceedings]. Kursk: South-West State University. 24-26.
- Gershkovich, M.M., T.K. Biryukova, and V. I. Sinitsyn. 2012. Printsipy sozdaniya i rezul'taty realizatsii informatsionno-telekommunikatsionnoy territorial'no-
raspredelennoy sistemy v zashchishchennom ispolnenii [Principles of design and implementation results of information-telecommunication area distributed systems in
secure execution]. Informatsionnye Tehnologii Upravlenija Informatsionnymi Resursami Dvoynogo Primeneniya (VI). [Dual use informational technologies of informational
resources management]. Eds. I.A. Sokolov and I. N. Sinitsyn. Moscow: IPI RAN.
Preprint. 33-46.
- Pesenko, Ju.A. 1982. Printsipy i metody kolichestvennogo analiza v faunisticheskikh
issledovaniyakh [Principles and methods of quantitative analysis in faunal studies].
Moscow: Nauka. 287 p.
- Obuhova, O. L., I.V. Solovyov, T.K. Biryukova, M.M. Gershkovich, and
A. P. Chochia. 2009. Model' fasetnogo informatsionnogo poiska v kollektsii nauchnykh
materialov [The model for facet informational search in the collection of the scientific
materials]. Sistemy i Sredstva Informatiki | Systems and Means of Informatics.
Additional issue. Moscow: Nauka. 163-174.
- Philips, L. 1990. Hanging on the metaphone. Computer Language 7(12):39-43.
- Kan'kovski, P. 2002. \Kak Vasha familiya", ili Russkiy MetaPhone [`What is your
second name,' or RussianMethaphone]. Programmist [Programmer] 8:36-39.
- Levenshtejn, V. I. 1965. Dvoichnye kody s ispravleniem vypadeniy, vstavok i zameshcheniy simvolov [Binary codes with correction of symbols withdrawal, insertions, and
substitutions]. Dokl. USSR Akad. Sci. 163(4):845-848.
- Damerau, F. J. 1964. A technique for computer detection and correction of spelling
errors. Comm. ACM 7(3):171-176.
[+] About this article
Title
THE TASKS OF IDENTIFICATION OF INFORMATIONAL OBJECTS
IN AREA-SPREAD DATA ARRAYS
Journal
Systems and Means of Informatics
Volume 24, Issue 1, pp 224-243
Cover Date
2013-11-30
DOI
10.14357/08696527140114
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
identification of informational objects; identification of objects;
correlation search; search for associations; identity of objects; fusion of informational objects; fusion of objects; text attributes; data distortions; phonetic
distortions; transcriptional errors; Latin to Cyrillic transcription; Cyrillic to
Latin transcription; Metaphone; Levenstein's distance; spread systems; area-
spread systems; hierarchical systems; flow of events
Authors
M.M. Gershkovich and T.K. Birukova
Author Affiliations
Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str.,
Moscow 119333, Russian Federation
|