Systems and Means of Informatics
2024, Volume 34, Issue 4, pp 73-84
DEVELOPING THE STRUCTURE OF SUPRACORPORA DATABASES
Abstract
The paper presents the methods for developing the structure of supracorpora databases to provide a more detailed representation of the results from parallel text analysis. The initial data structure for the annotation of are described. These methods provide the possibilities (i) to mark up the original and translation text blocks in more detail; (ii) to classify the features of a text block using multiple facets; (iii) to save data about lexical markers of text block features; and (iv) to save data about the irrelevance of text fragments pairs to a search query. All these possibilities allow improving the quality of the final data in terms of its completeness and consistency and the corresponding changes in the data structure can make it more flexible. The proposed changes to the data structure are independent of the goals and objectives of any specific study that may be conducted using supracorpora databases.
[+] References (14)
- Kruzhkov, M. G. 2015. Informatsionnye resursy kontrastivnykh lingvisticheskikh issledovaniy: elektronnye korpusa tekstov [Information resources for contrastive studies: Electronic text corpora]. Sistemy i Sredstva Informatiki — Systems and Means of Informatics 25(2): 140-159. EDN: UBFBRF.
- Zatsman, I., M. Kruzhkov, and E. Loshchilova. 2019. Metody i sredstva informatiki dlya opisaniya struktury neodnoslovnykh konnektorov [Methods and means of informatics for multiword connectives structure description]. Struktura konnektorov i metody ee opisaniya [Connectives structure and methods of its description]. Ed. O. Yu. Inko- va. Moscow: TORUS PRESS. 205-230. doi: 10.30826/SEMANTICS19-06. EDN: YVAJWN.
- Kruzhkov, M. G. 2021. Kontseptsiya postroeniya nadkorpusnykh baz dannykh [Conceptual framework for supracorpora databases]. Sistemy i Sredstva Informatiki — Systems and Means of Informatics 31(3):101-112. doi: 10.14357/08696527210309. EDN: UMWNIU.
- Egorova, A. Yu., I. M. Zatsman, and O. S. Mamonova. 2019. Nadkorpusnye basy dannykh v lingvisticheskikh proektakh [Supracorpora databases in linguistic projects]. Sistemy i Sredstva Informatiki — Systems and Means of Informatics 29(3):77-91. doi: 10.14357/08696527190307. EDN: FCPPFH.
- Nuriev, V.A., and M. G. Kruzhkov. 2023. Korpusnye dannye pri kontrastivnom izuchenii punktuatsii [The parallel corpora perspective on studying contrastive punctuation]. Sistemy i Sredstva Informatiki — Systems and Means of Informatics 33(1): 14-23. doi: 10.14357/08696527230102. EDN: JOUMFY.
- Goncharov, A. A. 2023. Poisk s isklyucheniem v parallel’nykh tekstakh [Search with exclusion in parallel texts]. Sistemy i Sredstva Informatiki — Systems and Means of Informatics 33(4):102-114. doi: 10.14357/08696527230410.EDN: CVPFDV.
- Durnovo, A. A., O. Yu. Inkova, and N. A. Popkova. 2022. Arkhitektura bazy dannykh ierarkhii logiko-semanticheskikh otnosheniy [Database of hierarchies of logical-semantic relations: Architecture]. Sistemy i Sredstva Informatiki — Systems and Means of Informatics 32(1):114{125. doi: 10.14357/08696527220111. EDN: RKYUXV.
- Durnovo, A. A., O. Yu. Inkova, and N. A. Popkova. 2022. Interfeys pol’zovatelya Nadkorpusnoy bazy dannykh ierarkhiy logiko-semanticheskikh otnosheniy [Database of hierarchies of logical-semantic relations: User interface]. Sistemy i Sredstva Informatiki — Systems and Means of Informatics 32(3):103—115. doi: 10.14357/08696527220310. EDN: JZUCUX.
- Durnovo, A. A., O. Yu. Inkova, and V. A. Nuriev. 2023. Integratsionnye vozmozhnosti nadkorpusnykh baz dannykh [Integration capacities of supracorpora databases]. Sistemy i Sredstva Informatiki — Systems and Means of Informatics 33(1):24{34. doi:
10.14357/08696527230103. EDN: YKHVIX.
- Goncharov, A. A. 2023. Annotirovanie parallel’nykh korpusov: podkhody i napravleniya razvitiya [Parallel corpus annotation: Approaches and directions for development]. Informatika i ee Primeneniya — Inform. Appl. 17(4):81—87. doi: 10.14357/ 19922264230411. EDN: GDKDOZ.
- Kruzhkov, M.G., N. V. Buntman, E.Ju. Loshchilova, D.V. Sitchinava, A. A. Zalisniak, and I. M. Zatsman. 2014. A database of Russian verbal forms and their French translation equivalents. Komp’yuternaya lingvistika i intellektual’nye tekhnologii [Computer Linguistics and Intellectual Technologies] 13(20):275—287. EDN: SKKDWB.
- Inkova, O. Yu., ed. 2018. Semantika konnektorov: kontrastivnoe issledovanie [Semantics of connectives: A contrastive study]. Moscow: TORUS PRESS. 368 p.
- Dobrovol’skij, D. O., and A. A. Zalizniak. 2018. Nemetskie konstruktsii s modal’nymi glagolami i ikh russkie sootvetstviya: proekt nadkorpusnoy bazy dannykh [German constructions with modal verbs and their Russian correlates: A supracorpora database project]. Komp'yuternaya lingvistika i intellektual'nye tekhnologii [Computer Linguistics and Intellectual Technologies]. Moscow: RGGU. 17(24): 172—184. EDN: LZRUTR.
- Natsional’nyy korpus russkogo yazyka. Morfologicheskaya razmetka. Morfologicheskiy standart Natsional’nogo korpusa russkogo yazyka [Russian national corpus. Morphology. The morphological standard of the RNC]. Available at: https:// ruscorpora.ru/page/instruction-morph (accessed October 15, 2024).
[+] About this article
Title
DEVELOPING THE STRUCTURE OF SUPRACORPORA DATABASES
Journal
Systems and Means of Informatics
Volume 34, Issue 4, pp 73-84
Cover Date
2024-12-10
DOI
10.14357/08696527240406
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
supracorpora database; parallel texts; text annotation; corpus linguistics
Authors
A. A. Goncharov
Author Affiliations
Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|