Systems and Means of Informatics
2018, Volume 28, Issue 4, pp 168-181
METHOD FOR DESCRIPTION OF MULTIWORD CONNECTIVES IN SUPRACORPORA DATABASES
- O. Yu. Inkova
- M. G. Kruzhkov
Abstract
This article presents a new method for describing the structure of multiword connectives implemented in the Supracorpora database (SCDB) of connectives. Currently, the structure of connectives is underinvestigated, and criteria for determining boundaries of connectives and their components are lacking. The proposed method is based on the cognitive-semantic approach that considers multiword connectives as more or less free word combinations generated in the process of speech. A two-tier faceted classification is proposed which allows annotating, on one hand, specific tokens of connectives in texts (context annotation) and, on the other hand, the inner structure of connectives (structural annotation). The structural annotation is based on two aspects: structural type and structural components of connectives. Based on the proposed annotation method, a system of cross-clusters is implemented that extends the search and statistical capabilities of SCDB. In addition, this method allows researchers to eliminate subjectivity during the annotation process and to fill some gaps in linguistic knowledge, for example, to gather new data on combinatorial capabilities of Russian connectives.
[+] References (18)
- In'kova-Manzotti, O.Yu. 2001. Konnektory protivopostavleniya vo frantsuzskom i russkom yazykakh. Sopostavitel'noe issledovanie [Connectives of opposition in French and Russian. Comparative research]. Moscow: Informelektro. 429 p.
- Shvedova, N. Yu., ed. 1980. Russkaya grammatika. T. 2. Sintaksis [Russian grammar. Vol. 2. Syntax]. Moscow: Nauka. 714 p.
- In'kova, O.Yu., and N. A. Popkova. 2016. Struktura dvukhmestnykh konnektorov russkogo yazyka v svete korpusnykh dannykh [Structure of two-part Russian connectives based on corpus data]. Computer Linguistic and Intellectual Technologies: Conference (International) "Dialog" Proceedings. Moscow: RGGU. 15(22):763-775.
- Inkova, O. Yu. 2016. K probleme opisaniya mnogokomponentnykhkonnektorov russkogo yazyka: ne tol'ko... no i [Towards the description of multiword connectives in Russian: ne tol'ko. .. no i 'not only. . . but also']. Voprosy yazykoznaniya [Topics in the Study of Language] 2:37-60.
- Amelicheva, V. M. 2017. Formal'noeisemanticheskoevar'irovanierusskogokonnektora ne tol'ko. . . no i i ego frantsuzskie ekvivalenty [Formal and semantic variation of the Russian connective ne tol'ko... no i (not only... but also) and its French equivalents]. Contrastive Linguistics 42(4):9-20.
- Kruzhkov, M. G. 2015. Informatsionnyeresursy kontrastivnykh lingvisticheskikhissle- dovaniy: elektronnye korpusa tekstov [Information resources for contrastive studies: Electronic text corpora]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(2):140-159.
- Mustayoki, A., and M. Kopotev. 2004. K voprosu o statuse ekvivalentov slova tipa potomu chto, v zavisimosti ot, k sozhaleniyu [On the status of such word equivalents as potomu chto, v zavisimosti ot, k sozhaleniyu (because, depending on, unfortunately)]. Voprosy yazykoznaniya [Topics in the Study of Language] 3:88-107.
- Cheremisina, M.I., and T.A. Kolosova. 2010. Ocherki po teorii slozhnogo pred- lozheniya [Studies on the theory of the complex sentence]. 2nd ed. Moscow: URSS. 226 p.
- Danlos, L., M. Colinet, and J. Steinlin. 2015. FDTB1, premiere etape du projet "French Discourse Treebank:" Reperage des connecteurs de discours en corpus. Discours
17. Avaliable at: http://discours.revues.org/9065 (accessed August 29, 2018).
- Knott, A. 1996. A data-driven methodology for motivating a set of coherence relations. Edinburg: University of Edinburgh. Ph.D. Thesis. Available at: https://www. era.lib.ed.ac.uk/handle/1842/583 (accessed August 29, 2018).
- Marcu, D. 1997. The rhetorical parsing, summarization and generation of natural language texts. Toronto: University of Toronto. Ph.D. Thesis. Available at: http:// ftp.cs.toronto.edu/pub/gh/Marcu-PhDthesis.pdf (accessed August 29, 2018).
- Alonso, L., I. Castellon, and L. Padro. 2002. Lexicon computacional de marcadores del discurso. Procesamiento Lenguage Natural 29:239-246.
- Roze, Ch., L. Danlos, and Ph. Muller. 2012. LEXCONN: A French lexicon of discourse connectives. Discours 10. Available at: http://discours.revues.org/8645 (accessed August 29, 2018).
- In'kova, O. Yu. 2018. Nadkorpusnaya baza dannykh kak instrument formal'noy variativnosti konnektorov [Supracorpora database as an instrument for formal variation of connectives]. Computer Linguistic and Intellectual Technologies: Conference (International) "Dialog" Proceedings. Moscow: RGGU. 17(24):240-253.
- Kruzhkov, M. 2016. Supracorpora databases as corpus-based superstructure for manual annotation of parallel corpora. 8th Conference (International) on Corpus Linguistics. Eds. A. M. Ortiz and C. Perez-Hernandez. EPiC ser. in language and linguistics. 1:236-248. Available at: https://easychair.org/publications/paper/jFjs (accessed August 29, 2018).
- Zatsman, I.M., O.Yu. In'kova, M. G. Kruzhkov, and N. A. Popkova. 2016. Pred- stavlenie krossyazykovykh znaniy o konnektorakh v nadkorpusnykh bazakh dannykh [Representation of cross-lingual knowledge about connectors in suprocorpora databases]. Informatika i ee Primeneniya - Inform. Appl. 10 (1): 106-118.
- Zatsman, I.M., M. G. Kruzhkov, and E.Yu. Loshchilova. 2017. Metody analiza chastotnosti modeley perevoda konnektorov i obratimost' generalizatsii statisticheskikh dannykh [Methods of frequency analysis of connectives translations and reversibility of statistical data generalization]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 27 (4): 164-176.
- Kobozeva, I. M. 2017. Konnektory kontaktnogo predshestvovaniya vo frantsuzskom i russkom yazykakh po dannym parallel'nogo korpusa [Connectives of immediate precedence in Russian and French based on parallel corpus data]. Contrastive Linguistics 42(4):48-62.
[+] About this article
Title
METHOD FOR DESCRIPTION OF MULTIWORD CONNECTIVES IN SUPRACORPORA DATABASES
Journal
Systems and Means of Informatics
Volume 28, Issue 4, pp 168-181
Cover Date
2018-11-30
DOI
10.14357/08696527180416
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
connectives; linguistic items structure; linguistic items variation; corpus linguistics; annotation; faceted classification; supracorpora databases
Authors
O. Yu. Inkova and M. G. Kruzhkov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|