Informatics and Applications
2018, Volume 12, Issue 3, pp 83-90
STATISTICAL ANALYSIS OF LANGUAGE SPECIFICITY OF CONNECTIVES BASED ON PARALLEL TEXTS
- O. Yu. Inkova
- M. G. Kruzhkov
Abstract
In recent decades, problems of language specificity in the Russian language attract considerable attention of researchers, although until recently, they have not been thoroughly examined using corpus-based methods. This paper presents a new method of investigating language specificity of Russian connectives based on statistical analysis of annotated parallel texts. Russian-French and French-Russian parallel texts are processed with the help of the Supracorpora Database (SCDB) of Connectives designed specifically for annotation of translation correspondences (TCs) found in parallel texts. Each TC includes annotations of a Russian connective and its translation equivalent (TE), which enables one to obtain statistical data on various translation models (TMs) based on several proposed parameters of language specificity of connectives. As an example, in this work, language specificity of two Russian connectives will be examined: или and а то. Based on the proposed statistical parameters, it will be demonstrated that или has a very low degree of language specificity in the context of the Russian-French language pair, while а то is a highly language-specific connective. The results of this research are applicable to informatics (machine translation and statistical analysis of textual data) and comparative study of languages, such as lexical typology, lexicography, and theory and practice of translation.
[+] References (12)
- Wierzbicka, A. 1992. Semantics, culture, and cognition. Universal human concepts in culture-specific configurations. Oxford: Oxford University Press. 496 p.
- Zaliznyak, Anna A., and I. B. Levontina. 1996. Otrazhenie natsional'nogo kharaktera v leksike russkogo yazyka [The reflection of the national character in the vocabulary of the Russian language]. Russ. Linguist. 20:237-264.
- Zaliznyak, Anna A., I. B. Levontina, and A. D. Shmelev. 2012. Konstanty i peremennye russkoy yazykovoy kartiny mira [Constants and variables of the Russian language pic-ture of the world]. Moscow: Yazyki Slavyanskikh Kul'tur [Languages of Slavic Cultures]. 696 p.
- Zaliznyak, Anna A. 2015. Lingvospetsifichnye edinitsy russkogo yazyka v svete kontrastivnogo lingvisticheskogo analiza [Russian language-specific words as an object of contrastive corpus analysis]. Computational Linguistics and Intellectual Technologies: Conference (International) "Dia-logue 2015" Proceedings. Moscow: RGGU. 14(21):683- 695.
- Kruzhkov, M. 2016. Supracorpora Databases as corpus- based superstructure for manual annotation of parallel corpora. 8th Conference (International) on Corpus Lin-guistics. EPiC ser. in language and linguistics. 1:236- 248. Available at: https://easychair.org/publications/ paper/270289 (accessed May 29, 2017).
- Inkova, O., and M. Kruzhkov. 2016. Nadkorpusnye russko-frantsuzskiebazydannykhglagol'nykhformikon- nektorov [Supracorpora databases of Russian and French verbal forms and connectors]. Lingue slave a confronto. Bergamo: Bergamo University Press. 365-392.
- Inkova, O., and N. Popkova. 2017. Statistical data as information source for linguistic analysis of Russian con-nectors. Informatika i ee Primeneniya - Inform. Appl. 11(3):123-131.
- Johansson, S. 2007. Seeing through Multilingual Corpora. Amsterdam: John Benjamins B.V. 355 p.
- Inkova, O. 2017. Printsipy opredeleniya stepeni lingvospetsifichnosti konnektorov [Principles of how to determine the degree of language-specificity of connectives]. Computational Linguistics and Intellectual Technolo-gies: Conference (International) "Dialogue 2017" Proceed-ings. Moscow: RGGU. 16(23):139-149.
- Inkova-Manzotti, O. 2005. Encore sur la conjonction russe a to. Revue des etudes slaves 76(4):485-497.
- Sanders, T. J. M. 2005. Coherence, causality and cognitive complexity in discourse. 1st Symposium (International)
on the Exploration and Modelling of Meaning Proceedings. Toulouse: University of Toulouse-le-Mirail. 105-114.
- Hoek, J., J. Evers-Vermeul, and T. Sanders. 2015. The role of expectedness in the implicitation and explicitation of discourse relations. Discourse in Machine Translation (DiscoMT2015): 2nd Workshop Proceedings. Lisbon, Por-tugal: Association for Computational Linguistics. 41-46.
[+] About this article
Title
STATISTICAL ANALYSIS OF LANGUAGE SPECIFICITY OF CONNECTIVES BASED ON PARALLEL TEXTS
Journal
Informatics and Applications
2018, Volume 12, Issue 3, pp 83-90
Cover Date
2018-08-30
DOI
10.14357/19922264180312
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
supracorpora databases; statistical analysis; contrastive corpus analysis; language specificity; parallel corpora; linguistic information resources; connectives; discourse relations; semantics
Authors
O. Yu. Inkova and M. G. Kruzhkov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|