Systems and Means of Informatics

2023, Volume 33, Issue 4, pp 115-125

GRAPH n-GRAMS IN THE TEXT ATTRIBUTION PROBLEM

  • N. D. Moskin
  • A. A. Rogov
  • A. A. Lebedev

Abstract

The paper presents the results of research in the field of modeling the structure of texts using a generalized context-dependent graph-theoretic model. The object of the study is mainly literary and folklore texts for which the task of attribution arises. For example, there are many such texts in the works of the famous Russian writer F. M. Dostoevsky. The authors show how it is possible to build hybrid models based on dependency trees, graph models of syntactic structure of links between simple sentences in a multicomponent complex sentence, and "strong links" graphs of word combinations of different grammatical classes. Such models make it possible to construct new informative features that are potentially applicable in the attribution of texts. An example is the frequency of occurrence of graph n-grams which are generalizations of ordinary n-grams syntactic n-grams, and other similar constructions used in stylistic studies. The article also discusses the format for storing texts, their generalized graph models, and graph n-grams.

[+] References (9)

[+] About this article