Systems and Means of Informatics
2023, Volume 33, Issue 4, pp 115-125
GRAPH n-GRAMS IN THE TEXT ATTRIBUTION PROBLEM
- N. D. Moskin
- A. A. Rogov
- A. A. Lebedev
Abstract
The paper presents the results of research in the field of modeling the structure of texts using a generalized context-dependent graph-theoretic model. The object of the study is mainly literary and folklore texts for which the task of attribution arises. For example, there are many such texts in the works of the famous Russian writer F. M. Dostoevsky. The authors show how it is possible to build hybrid models based on dependency trees, graph models of syntactic structure of links between simple sentences in a multicomponent complex sentence, and "strong links" graphs of word combinations of different grammatical classes. Such models make it possible to construct new informative features that are potentially applicable in the attribution of texts. An example is the frequency of occurrence of graph n-grams which are generalizations of ordinary n-grams syntactic n-grams, and other similar constructions used in stylistic studies. The article also discusses the format for storing texts, their generalized graph models, and graph n-grams.
[+] References (9)
- Moskin, N. D. 2022. Teoretiko-grafovye modeli, metody i programmnye sredstva intellektual'nogo analiza tekstovoy informatsii na primere fol'klornykh i literaturnykh proizvedeniy [Graph-theoretical models, methods, and software tools for intellectual analysis of textual information on the example of folklore and literary works]. Petrozavodsk. D.Sc. Diss. 370 p.
- Belov, S. D., D. P. Zrelova, P. V. Zrelov, and V. V. Koren'kov. 2020. Obzor metodov avtomaticheskoy obrabotki tekstov na estestvennom yazyke [Overview of methods for automatic natural language text processing]. Sistemnyy analiz v nauke i obrazovanii [System Analysis in Science and Education] 3:8-22. doi: 10.37005/2071-9612-2020-38-22. EDN: YJFAYK.
- Tselykh, A. A., and M. A. Dedulina. 2018. Teoretiko-grafovye podkhody k modelirovaniyu aktor-setey v issledovaniyakh nauki i tekhnologiy [Graph-theoretic approaches to modeling actor-networks in science and technology studies]. Modelirovanie, optimizatsiya i informatsionnye tekhnologii [Modeling, Optimization and Information Technology] 6(4):244-259. doi: 10.26102/2310-6018/2018.23.4.019. EDN: YZSOGL.
- Astanin, S.V., N. V. Dragnysh, and N. K. Zhukovskaya. 2012. Vlozhennye metagrafy kak modeli slozhnykh ob"ektov [Nested metagraphs as models of complex object]. Inzhenernyy vestnik Dona [Engineering J. of Don] 4-2:1434. 5 p. EDN: PVJCYN.
- Rogov, A. A., R. V. Abramov, D. D. Buchneva, O. V. Zakharova, K. A. Kulakov, A. A. Lebedev, N. D. Moskin, A. V. Otlivanchik, E.D. Savinov, and Yu. V. Sidorov.
2021. Problema atributsii v zhurnalakh "Vremya," "Epokha" i ezhenedel'nike "Grazhdanin" [The problem of attribution in the magazines "Time," "Epoch," and the weekly "Citizen"]. Petrozavodsk: Ostrova. 391 p. EDN: AIACMP.
- Sevbo, I. P. 1981. Graficheskoe predstavlenie sintaksicheskikh struktur i stilisticheskaya diagnostika [Graphic representation of syntactic structures and stylistic diagnostics]. Kyiv: Naukova dumka. 192 p.
- Martynenko, G. Ya. 2019. Metody matematicheskoy lingvistiki v stilisticheskikh issledovaniyakh [Methods of mathematical linguistics in stylistic research]. Saint Petersburg: Nestor-History Publishing House. 296 p.
- Sidorov, G. O. 2018. Sintaksicheskie n-grammy v komp'yuternoy lingvistike [Syntactic n-grams in computational linguistics]. Moscow: Moscow University Press. 120 p.
- Cheng, W., C. Greaves, and M. Warren. 2006. From n-gramto skipgramto concgram. Int. J. Corpus Linguis. 11(4):411-433. doi: 10.1075/ijcl.11.4.04che.
[+] About this article
Title
GRAPH n-GRAMS IN THE TEXT ATTRIBUTION PROBLEM
Journal
Systems and Means of Informatics
Volume 33, Issue 4, pp 115-125
Cover Date
2023-12-10
DOI
10.14357/08696527230411
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
artificial intelligence; text attribution; graph; metagraph; hybrid graph; folklore text; literary text; graph n-gram
Authors
N. D. Moskin , A. A. Rogov , and A. A. Lebedev
Author Affiliations
Petrozavodsk State University, 33 Lenina Prosp., Petrozavodsk 185910, Russian Federation
|