Systems and Means of Informatics
2017, Volume 27, Issue 1, pp 100-107
ON THE MAIN TYPES OF RELATEDNESS BETWEEN TEXT DOCUMENTS
- M. M. Charnine
- N. V. Somin
Abstract
This paper considers the question of relatedness of natural language texts based on textual features (fragments). Two types of relatedness are revealed: first, explicit relatedness, when the texts are linked by bibliographic references, and, second, implicit relatedness, when the texts are linked through common text fragments. The advantages and applications of implicit relatedness are discussed. It is shown that the use of implicit relatedness increases the scope of text processing techniques based on relatedness of texts significantly. Measures of explicit and implicit relatedness are proposed. An experiment was conducted on a set of texts from the subj ect area of " computer graphics. " On the basis of the experiment, it was shown that both types of relatedness are correlated with each other. The authors found the parameters of text processing when the correlation was at maximum and reached about 55%. The plan for further development of the proposed method of texts comparison and refinement of the results is suggested.
[+] References (8)
- Mikheev, M.Yu., N. V. Somin, I.V. Galina, O.V. Zolotaryev, E.B. Kozerenko, Yu. I. Morozova, and M. M. Charnine. 2014. Fal'shteksty: Klassifikatsiya i metody opoznaniya tekstovykh imitatsiy i dokumentov s podmenoy avtorstva [False texts: Classification and methods of identification of text documents with imitations and substitution of authorship]. Informatika i ee Primeneniya - Inform. Appl. 8(4):70-77.
- Charnine, M. M., and I. P. Kuznetsov. 2004. Avtomaticheskoe formirovanie elektron- nykh entsiklopediy i spravochnykh posobiy po informatsii iz seti Internet [Automatic creation of electronic encyclopedias and handbooks based on information from the Internet]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 14:210-223.
- Charnine, M.M., I. P. Kuznetsov, and E.B. Kozerenko. 2005. Semantic navigator for Internet search. Conference (International) on Machine Learning Proceedings. Las Vegas, NV: CSREA Press. 60-65.
- Charnine, M., and V. Charnine. 2008. Keywen category structure. Indianapolis, IN: Wordclay. 60 p.
- Kuznetsov, I .P., M. M. Charnine, E. B. Kozerenko, N. V. Somin, V. G. Nikolayev, and A. G. Matskevich. 2012. Intelligent tools for the semantic Internet navigator design. Conference (International ) RCDL Proceedings. Pereslavl-Zalessky. 274-283.
- Charnine, M. 2013. Keywen automated writing tools. Booktango. 129 p.
- Charnine, M., and S. Klimenko. 2015. Measuring of "idea-based" influence of scientific papers. Conference (International) on Information Science and Security Proceedings. Seoul, South Korea. 160-164.
- Galina, I.V., E.B. Kozerenko, Yu. I. Morozova, N. V. Somin, and M. M. Charnine. 2015. Assotsiativnye portrety predmetnoy oblasti - instrument avtomatizirovannogo postroeniya sistem Big Data dlya izvlecheniya znaniy: Teoriya, metodika, vizualizatsiya, vozmozhnoe primenenie [Associative portraits of subject areas as a tool for automated construction of Big Data systems for knowledge extraction: Theory, methods, visualization, and application]. Informatika i ee Primeneniya - Inform. Appl. 9(2):93-110.
[+] About this article
Title
ON THE MAIN TYPES OF RELATEDNESS BETWEEN TEXT DOCUMENTS
Journal
Systems and Means of Informatics
Volume 27, Issue 1, pp 100-107
Cover Date
2017-03-30
DOI
10.14357/08696527170107
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
relatedness between texts; explicit relatedness; implicit relatedness; measure of relatedness; collection of texts; correlation
Authors
M. M. Charnine and N. V. Somin
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|