Systems and Means of Informatics
2018, Volume 28, Issue 4, pp 168-181
- O. Yu. Inkova
- M. G. Kruzhkov
This article presents a new method for describing the structure of multiword connectives implemented in the Supracorpora database (SCDB) of connectives. Currently, the structure of connectives is underinvestigated, and criteria for determining boundaries of connectives and their components are lacking. The proposed method is based on the cognitive-semantic approach that considers multiword connectives as more or less free word combinations generated in the process of speech. A two-tier faceted classification is proposed which allows annotating, on one hand, specific tokens of connectives in texts (context annotation) and, on the other hand, the inner structure of connectives (structural annotation). The structural annotation is based on two aspects: structural type and structural components of connectives. Based on the proposed annotation method, a system of cross-clusters is implemented that extends the search and statistical capabilities of SCDB. In addition, this method allows researchers to eliminate subjectivity during the annotation process and to fill some gaps in linguistic knowledge, for example, to gather new data on combinatorial capabilities of Russian connectives.
connectives; linguistic items structure; linguistic items variation; corpus linguistics; annotation; faceted classification; supracorpora databases
O. Yu. Inkova  and M. G. Kruzhkov
 Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation