Informatics and Applications
2014, Volume 8, Issue 2, pp 130-144
The paper outlines the technology used to determine the degree of similarity of information objects,
which are represented by text or graphic images. Objects are formalized by probabilistic models. The structure of
the model is set by an algebra on a minimum set of graphic components of an object. Quantitative characteristics
of the structure of objects are the probability distributions on the algebra. The amount of information in objects is
estimated by entropy. The similarity measure of information objects is based on entropy. The paper describes the
method of estimating the proximity of text and graphic objects. The paper provides several examples of estimation
algorithms implementation. It is shown that the developed method is more efficient compared to the methods
described in the literature. The technology used to form images of information objects and to compare their
semantic content is universal. It is possible to adapt the technology to the meaningful characteristics of objects
being analyzed.
Institute of Informatics Problems, Russian Academy of Sciences
Key words
information object; text; image; probabilisticmodel; semantic similarity; entropy; measure of similarity
L.A. Kuznetsov  ,
 Russian Presidential Academy of National Economy and Public Administration (Lipetsk Branch), 3 Internatsional’naya
Str., Lipetskaya oblast, Lipetsk 398050, Russian Federation