Systems and Means of Informatics
2017, Volume 27, Issue 4, pp 164-176
METHODS OF FREQUENCY ANALYSIS OF CONNECTIVES TRANSLATIONS AND REVERSIBILITY OF STATISTICAL DATA GENERALIZATION
- I. M. Zatsman
- M. G. Kruzhkov
- E. Ju. Loshchilova
Abstract
The methods of Russian connectives frequency analysis are examined, including analysis of their translation models in Russian-French parallel texts.
The parallel texts are integrated into a supracorpora database (SCDB) which also includes bilingual annotations of translation correspondences. The annotations include properties of the examined linguistic items (Russian connectives) and, at the same time, properties of the corresponding linguistic items found in the translation. These properties are organized as a faceted classification in the SCDB describing the translation models found in the SCDB from various perspectives. A characteristic feature of the connectives translations frequency analysis methods implemented in the SCDB is the reversibility of the calculated statistical data, meaning that the calculated frequency values act as hyperlinks to the lists of the annotations those values are based on, which represent occurrences of the corresponding connectives in the parallel texts of the SCDB. The use of faceted classifications in the SCDB allows for multidimensional statistical analysis of the annotated connectives and translation models. The calculated statistical data are verifiable because they allow tracing the given values directly to the annotations they are based on. The main goal of this paper is to describe methods of frequency analysis of connectives translation models, including those that support the reversibility of the calculated statistical data on different generalization levels.
[+] References (12)
- Prasad, R., and H. Bunt. 2015. Semantic relations in discourse: The current state of ISO 24617-8. 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation Proceedings. Tilburg: Tilburg University. 80-92.
- Bunt, H., and R. Prasad. 2016. ISO-DR-Core (ISO 24617-8): Core concepts for the annotation of discourse relations. 12th Joint ACL-ISO Workshop on Interoperable Semantic Annotation Proceedings. 45-54. Available at: http://www.lrec- conf.org/proceedings/lrec2016/LREC2016_Proceedings.zip (accessed November 7, 2017).
- Zatsman, I.M., O.Yu. In'kova, M. G. Kruzhkov, and N. A. Popkova. 2016. Predstavlenie krossyazykovykh znaniy o konnektorakh v nadkorpusnykh bazakh dannykh [Representation of cross-lingual knowledge about connectors in supracorpora databases]. Informatika i ee Primeneniya - Inform. Appl. 10(1): 106-118.
- In'kova-Manzotti, O.Yu. 2001. Konnektory protivopostavleniya vo frantsuzskom i russkom yazykakh. Sopostavitel'noe issledovanie [Connectives of opposition in Russian and French. Comparative research]. Moscow: Informelektro. 432 p.
- Zaliznyak, Anna A., I. M. Zatsman, O.Yu. In'kova, and M. G. Kruzhkov. 2015. Nadkorpusnye bazy dannykh kak lingvisticheskiy resurs [Cupracorpora databases as linguistic resource]. 7th Conference (International) on Corpus Linguistics Proceedings. St. Petersburg: St. Petersburg State University. 211-218.
- Zaliznyak, Anna A., I. M. Zatsman, and O. Yu. In'kova. 2017. Nadkorpusnaya baza dannykh konnektorov: postroenie sistemy terminov [Supracorpora database on connectives: Term system development]. Informatika i ee Primeneniya - Inform. Appl. 11(1): 100-108.
- Zatsman, I.M., O. S. Mamonova, and A. Yu. Shchurova. 2017. Obratimost' i al'ternativnost' generalizatsii modeley perevoda konnektorov v parallel'nykh tek- stakh [Reversibility and alternativeness of generalization of connectives translations models in parallel texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 27 (2): 125-142.
- Dobrovol'skiy, D.O., A. A. Kretov, and S. A. Sharov. 2005. Korpus parallel'nykh tekstov: arkhitektura i vozmozhnosti ispol'zovaniya [Corpus of parallel texts: Architecture and applications]. Natsional'nyy korpus russkogo yazyka: 2003-2005 [The Russian National Corpus 2003-2005]. Moscow: Indrik. 263-296.
- Loiseau, S., D.V. Sitchinava, Anna A. Zalizniak, and I. M. Zatsman. 2013. Information technologies for creating the database of equivalent verbal forms in the Russian-French multivariant parallel corpus. Informatika i ee Primeneniya - Inform. Appl. 7(2): 100-109.
- Sichinava, D. V. 2014. Ispol'zovanie parallel'nogo korpusa dlya kolichestvennogo izucheniya lingvospetsifichnoy leksiki [Using a parallel corpus for quantativive research of language-specific lexis]. Yazyk, literatura, kul'tura: aktual'nye problemy izucheniya i prepodavaniya [Language, literature, culture: Urgent problems of research and teaching]. Moscow: MAKS Press. 10:37-44.
- Eco, U. 1967. Opera aperta. Milano: Bompiani. 286 p.
- In'kova, O. Yu. and N. A. Popkova. 2017. Statistical data as information source for linguistic analysis of Russian connectors. Informatika i ee Primeneniya - Inform. Appl. 11(3): 123-131.
[+] About this article
Title
METHODS OF FREQUENCY ANALYSIS OF CONNECTIVES TRANSLATIONS AND REVERSIBILITY OF STATISTICAL DATA GENERALIZATION
Journal
Systems and Means of Informatics
Volume 27, Issue 4, pp 164-176
Cover Date
2017-10-30
DOI
10.14357/08696527170413
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
supracorpora database; translation models; annotation of translation models; faceted classifications; corpus linguistics; generalization; reversibility of generalization process
Authors
I. M. Zatsman , M. G. Kruzhkov , and E. Ju. Loshchilova
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|