Systems and Means of Informatics

2017, Volume 27, Issue 1, pp 100-107

ON THE MAIN TYPES OF RELATEDNESS BETWEEN TEXT DOCUMENTS

  • M. M. Charnine
  • N. V. Somin

Abstract

This paper considers the question of relatedness of natural language texts based on textual features (fragments). Two types of relatedness are revealed: first, explicit relatedness, when the texts are linked by bibliographic references, and, second, implicit relatedness, when the texts are linked through common text fragments. The advantages and applications of implicit relatedness are discussed. It is shown that the use of implicit relatedness increases the scope of text processing techniques based on relatedness of texts significantly. Measures of explicit and implicit relatedness are proposed. An experiment was conducted on a set of texts from the subj ect area of " computer graphics. " On the basis of the experiment, it was shown that both types of relatedness are correlated with each other. The authors found the parameters of text processing when the correlation was at maximum and reached about 55%. The plan for further development of the proposed method of texts comparison and refinement of the results is suggested.

[+] References (8)

[+] About this article