Systems and Means of Informatics
2018, Volume 28, Issue 4, pp 156-167
SUPRACORPORA DATABASE OF CONNECTIVES: DESIGN-ORIENTED EVOLUTION OF THE TERM SYSTEM
- I. M. Zatsman
- M. G. Kruzhkov
Abstract
This article examines the process of design-oriented evolution of the term system for supracorpora databases (SCDB) which represent a new category of information resources in linguistics. The SCDB is based on parallel texts, i.e., texts placed alongside their translations and aligned with them at the sentence level. Although SCDBs are designed for annotation of a wide variety of linguistic items and their correspondences, this article specifically considers annotation of connectives. The annotation-centered design of SCDBs has led to emergence of new entities and notions in computer linguistics, and in the beginning of 2017, a custom term system was proposed for them. On one hand, the proposed terms are used by linguists in order to describe new knowledge generated as a result of annotation and investigation of linguistic units. On the other hand, these terms serve as a basis for design of the SCDB architecture and the associated dataware, lingware, and software. Since the first description of the terminology, the range of tasks accomplished with SCDBs has expanded significantly; hence, there is the need to further develop the initial design-oriented term system.
[+] References (11)
- Dobrovol'skiy, D.O., A. A. Kretov, and S. A. Sharov. 2005. Korpus parallel'nykh tekstov: arkhitektura i vozmozhnosti ispol'zovaniya [Corpus of parallel texts: Archi-tecture and applications]. Natsional'nyy korpus russkogo yazyka: 2003-2005 [The Russian National Corpus 2003-2005]. Moscow: Indrik. 263-296.
- Loiseau, S., D.V. Sitchinava, Anna A. Zalizniak, and I. M. Zatsman. 2013. In-formation technologies for creating the database of equivalent verbal forms in the Russian-French multivariant parallel corpus. Informatika i ee Primeneniya - Inform. Appl. 7(2): 100-109.
- Sichinava, D.V. 2014. Ispol'zovanie parallel'nogo korpusa dlya kolichestvennogo izucheniya lingvospetsifichnoy leksiki [Using a parallel corpus for quantitative research of language-specific lexis]. Yazyk, literatura, kul'tura: aktual'nye problemy izucheniya
i prepodavaniya [Language, literature, culture: Urgent problems of research and teaching]. Moscow: MAKS Press. 10:37-44.
- Kruzhkov, M. G. 2015. Informatsionnyeresursy kontrastivnykh lingvisticheskikhissle- dovaniy: elektronnye korpusa tekstov [Information resources for contrastive studies: Electronic text corpora]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(2): 140-159.
- Zaliznyak, Anna A., I. M. Zatsman, O.Yu. In'kova, and M. G. Kruzhkov. 2015. Nadkorpusnye bazy dannykh kak lingvisticheskiy resurs [Supracorpora databases as a linguistic resource]. 7th Conference (International) on Corpus Linguistics Proceedings. St. Petersburg: St. Petersburg State University. 211-218.
- Zatsman, I.M., O.Yu. In'kova, M. G. Kruzhkov, and N. A. Popkova. 2016. Pred- stavlenie krossyazykovykh znaniy o konnektorakh v nadkorpusnykh bazakh dannykh [Representation of cross-lingual knowledge about connectors in supracorpora databases]. Informatika i ee Primeneniya - Inform. Appl. 10 (1): 106-118.
- Zaliznyak, Anna A., I. M. Zatsman, and O. Yu. In'kova. 2017. Nadkorpusnaya baza dannykh konnektorov: postroenie sistemy terminov [Supracorpora database of connectives: Term system development]. Informatika i ee Primeneniya - Inform. Appl.
11 (1): 100-106.
- Zaliznyak, Anna A. 2016. Baza dannykh mezh"yazykovykh ekvivalentsiy kak instru-ment lingvisticheskogo analiza [A database of cross-lingistic equivalences as an instru-ment of linguistic analysis]. Computational Linguistics and Intellectual Technologies: Conference (International) "Dialog" Proceedings. Moscow: RGGU. 15(22):854-866.
- Zatsman, I.M., M. G. Kruzhkov, and E.Yu. Loshchilova. 2017. Metody analiza chastotnosti modeley perevoda konnektorov i obratimost' generalizatsii statisticheskikh dannykh [Reversibility and alternativeness of generalization of connectives translation models in parallel texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 27 (4): 164-176.
- Inkova, O. Yu., and N. À. Popkova. 2017. Statistical data as information source for linguistic analysis of Russian connectors. Informatika i ee Primeneniya - Inform. Appl. 11 (3): 123-131.
- Natsional'nyy korpus russkogo yazyka [The Russian National Corpus]. Available at: http://www.ruscorpora.ru (accessed September 4, 2018).
[+] About this article
Title
SUPRACORPORA DATABASE OF CONNECTIVES: DESIGN-ORIENTED EVOLUTION OF THE TERM SYSTEM
Journal
Systems and Means of Informatics
Volume 28, Issue 4, pp 156-167
Cover Date
2018-11-30
DOI
10.14357/08696527180415
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
supracorpora databases; term systems; annotation of linguistic units; parallel texts; corpus linguistics; connectives
Authors
I. M. Zatsman and M. G. Kruzhkov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|