Systems and Means of Informatics
2021, Volume 31, Issue 3, pp 101-112
CONCEPTUAL FRAMEWORK FOR SUPRACORPORA DATABASES
Abstract
The paper provides an overview of the concept, main structural constituents, and functions of supracorpora databases (SCDB). Supracorpora databases represent a novel type of structured information resources that significantly expand capabilities of linguistic text corpora, parallel corpora in particular. The paper outlines principle features and limitations of parallel corpora and demonstrates how SCDBs allow extending these features and overcoming the limitations. Supracorpora databases allow linguistic experts to establish, record, and annotate translation correspondences between language units in the source and target texts while relying on faceted classification categories composed by the researchers themselves according to their requirements. The article also describes the general structure of SCDB architecture developed in FRC CSC RAS which incorporates corpus and subcorpus constituents that interact with one another as a part of a common database.
[+] References (21)
- Morozova, Yu. I., E. B. Kozerenko, and M. M. Sharnin. 2014. Metodika izvlecheniya poslovnykh perevodnykh sootvetstviy iz parallel'nykh tekstov s primeneniem modeley distributivnoy semantiki [Method for extracting single-word translation correspon- cences from parallel texts using distributional semantics models]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 24(2): 131-142.
- Kruzhkov, M. G. 2015. Informatsionnye resursy kontrastivnykh lingvisticheskikh issledovaniy: elektronnye korpusa tekstov [Information resources for contrastive studies: Electronic text corpora]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(2): 140-159.
- Johansson, S. 2007. Seeing through Multilingual Corpora: On the use of corpora in contrastive studies. Studies in corpus linguistics ser. Amsterdam/Philadelphia: John Benjamins Publishing Co. Vol. 26. 378 p.
- Dobrovol'skiy, D.O., E.B. Krotova, and I. S. Parina. 2014. Korpusnaya leksikografiya (materialy master-klassa) [Corpus lexicography (workshop materials)]. Russkaya germanistika: ezhegodnik Rossiyskogo soyuza germanistov [Russian German studies: Yearbook of the Russian Union of Germanists]. Moscow: Yazyki slavyanskoy kul'tury. 11:237-278.
- Sichinava, D. V. 2019. Parallel'nye korpusa v sostave Natsional'nogo korpusa russkogo yazyka: novye yazyki i novye zadachi [Parallel corpora within the Russian National Corpus: New languages and new objectives]. Trudy Instituta russkogo yazyka im. V. V. Vinogradova [V. V. Vinogradov Russian Language Institute Proceedings]. 21:4160.
- Zaliznyak, A. A., I. M. Zatsman, O. Yu. Inkova, and M. G. Kruzhkov. 2015. Nadkorpusnye bazy dannykh kak lingvisticheskiy resurs [Supracorpora databases as linguistic resource]. Tr. 7-y Mezhdunar. konf. "Korpusnaya lingvistika" [7th Conference (International) on Corpus Linguistics Proceedings]. St. Petersburg: St. Petersburg State University. 211-218.
- Inkova, O., ed. 2018. Semantika konnektorov: kontrastivnoe issledovanie [Semantics of connectives: A contrastive study]. Moscow: TORUS PRESS. 368 p.
- Zatsman, I. M., and M. G. Kruzhkov. 2018. Nadkorpusnaya baza dannykh konnektorov: razvitie sistemy terminov proektirovaniya [Supracorpora database of connectives: Design-oriented evolution of the term system]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 28(4): 15 6-167.
- Buntman, N. V., and O.Yu. Inkova. 2018. Frantsuzskiy konnektor sinon [French connective sinon]. Semantika konnektorov: kontrastivnoe issledovanie [Semantics of connectives: A contrastive study]. Ed. O.Yu. Inkova. Moscow: TORUS PRESS. 301-330.
- Dobrovol'skiy, D. O., and Anna A. Zaliznyak. 2018. Nemetskie konstruktsii s modal'nymi glagolami i ikh russkie sootvetstviya: proekt nadkorpusnoy bazy dannykh [German constructions with modal verbs and their Russian correlates: A supracor- pora database project]. Komp'yuternaya lingvistika i intellektual'nye tekhnologii. Po mat-lam ezhegodnoy Mezhdunar. konf. "Dialog" [Computational Linguistics and Intellectual Technologies. Papers from the Annual Conference (International) "Dialogue"]. Moscow: RGGU. 172-184.
- Inkova, O., ed. 2019. Struktura konnektorov i metody ee opisaniya [Structure of connectives and methods for its description]. Moscow: TORUS PRESS. 316 p.
- Zatsman, I., M. Kruzhkov, and E. Loshchilova. 2019. Metody i sredstva informatiki dlya opisaniya struktury neodnoslovnykh konnektorov [Methods and means of informatics for multiword connectives structure description]. Struktura konnektorov i metody ee opisaniya [Connectives structure and methods of its description]. Ed. O. Yu. Inkova. Moscow: TORUS PRESS. 205-230.
- Goncharov, A. A., O.Yu. Inkova, and M. G. Kruzhkov 2019. Metodologiya annotirovaniya v nadkorpusnykh bazakh dannykh [Annotation methodology of supracorpora databases]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(2): 148-160.
- Inkova, O.Yu., and M. G. Kruzhkov. 2019. Sochetaemost' logiko-semanticheskikh otnosheniy: kolichestvennye metody analiza [Compatibility of logical semantic relations: Methods of quantitative analysis]. Informatika i ee Primeneniya - Inform. Appl. 13(2):83-91.
- Inkova, O., ed. 2021. Semantika konnektorov: kolichestvennye metody opisaniya [Semantics of connectives: Quantitative methods of analysis]. Bern/Berlin: Peter Lang. 276 p.
- Goncharov, A. A., N. V. Buntman, and V.A. Nuriev. 2021. Oshibki v mashinnom perevode konnektorov: sravnitel'nyy analiz raboty dvukh avtomaticheskikh perevod- chikov [Errors in machine translation of connectives: Comparative analysis of two translation engines]. Semantika konnektorov: kolichestvennye metody opisaniya [Semantics of connectives: Quantitative methods of analysis]. Ed. O. Inkova. Bern/Berlin: Peter Lang AG. 225-276.
- Plungyan, V. A. 2005. Zachem nuzhen Natsional'nyy korpus russkogo yazyka? Neformal'noe vvedenie [What the Russian National Corpus is for? Informal introduction]. Natsional'nyy korpus russkogo yazyka: 2003-2005 [Russian National Corpus 20032005]. Moscow: Indrik. 6-20.
- Natsional'nyy korpus russkogo yazyka [Russian National Corpus]. Available at: http:// www.ruscorpora.ru (accessed August 11, 2021).
- Popkova, N. A., O. Yu. Inkova, I. M. Zatsman, and M. G. Kruzhkov. 2015. Metodika postroeniya monoekvivalentsiy v nadkorpusnoy baze dannykh konnektorov [Methodology of constructing monoequivalences in the supracorpora database of connectors]. Tr. 2-y Molodezhnoy nauchn. konf. "Zadachi sovremennoy informatiki" [2nd Scientific Conference "Problems of Modern Informatics" Proceedings]. Moscow: FRC CSC RAS. 143-153.
- Inkova, O.Yu., and M. G. Kruzhkov. 2016. Nadkorpusnye russko-frantsuzskie basy dannykh glagol'nykh form konnektorov [Supracorpora databases of Russian and French verbal forms and connectors]. Lingue slave a confronto. Eds. O. Inkova and A. Trovesi. Bergamo: Bergamo University Press. 365-392.
- Zatsman, I.M., M. G. Kruzhkov, and E.Yu. Loshchilova. 2017. Metody analiza chastotnosti modeley perevoda konnektorov i obratimost' generalizatsii statisticheskikh dannykh [Methods of frequency analysis of connectives translations and reversibility of statistical data generalization]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 27 (4): 164-176.
[+] About this article
Title
CONCEPTUAL FRAMEWORK FOR SUPRACORPORA DATABASES
Journal
Systems and Means of Informatics
Volume 31, Issue 3, pp 101-112
Cover Date
2021-11-10
DOI
10.14357/08696527210309
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
corpus linguistics; supracorpora database; parallel corpus; linguistic annotation; information technologies; faceted classification
Authors
M. G. Kruzhkov
Author Affiliations
Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|