Informatics and Applications
2014, Volume 8, Issue 2, pp 98-110
INFORMATION TECHNOLOGIES FOR CORPUS STUDIES:
UNDERPINNINGS FOR CROSS-LINGUISTIC DATABASE CREATION
- N. V. Buntman
- Anna A. Zaliznyak
- I. M. Zatsman
- M. G. Kruzhkov
- E. Yu. Loshchilova
- D. V. Sitchinava
Abstract
Information technology for creation of cross-linguistic databases ofRussian texts with French translations
(also known as parallel texts) is considered. The underlying principles of the developed database provide a unique
combination of three types of bilingual search: lexical, grammatical, and lexico-grammatical. A distinctive feature
of the considered technology is simultaneous creation of Russian-French parallel subcorpus within the National
Russian Corpus and of the cross-linguistic database of Russian verbal lexico-grammatical forms and their French
functional equivalents. The subcorpus and the database have different levels of alignment: the former is aligned
at the level of sentences, and the later at the level of constructions. The academic relevance of the developed
database is due to its support of bilingual contrastive grammar development, as well as to its role in creation of
Russian grammar based on the modern empirical base and information technologies of corpus linguistics. The
main practical application of the database consists in improvement of quality of machine translation.
[+] References (23)
- Aijmer, K., and B. Altenberg. 2013. Advances in corpusbased
contrastive linguistics. Studies in honour of Stig Johansson.
Amsterdam: John Benjamins. 295 p.
- Dobrovolsky, D.O., A.A. Kretov, and S.A. Sharoff. 2005.
Korpus parallel’nykh tekstov [Corpus of parallel texts].
Nauchnaya i Tekhnicheskaya Informatsiya. Ser. 2. Informatsionnye
protsessy i sistemy [Scientific and technical
information. Ser. 2: Information processes and systems]
6:16–27.
- Kiseleva, K. L., E. V. Rahilina, V. A. Plungjan, and
S.G. Tatevosov, eds. 2009. Korpusnye issledovaniya po
russkoy grammatike [Corpus studies onRussian grammar].
Ìoscow: Probel-2000. 516 p.
- Dobrovolsky, D.O. 2009. Korpus parallel’nykh tekstov
v issledovanii kul’turno-spetsifichnoy leksiki [A corpus
of parallel texts and studying culture-specific lexicon].
Natsional’nyy korpus russkogo yazyka: 2006–2008. Novye
rezul’taty i perspektivy [Russian National Corpus: 2006–
2008. New results and prospects]. St. Petersburg: Nestor-
Istoriya. 383–401.
- Sitchinava, D. V., and M.A. Shvedova. 2010. Parallel’nye
korpusa v sostave Natsional’nogo korpusa russkogo yazyka:
Tekhnologii i reshaemye zadachi [Parallel corpora of
the Russian National Corpus: Technologies and problems].
Komp’yuternaya lingvistika: Nauchnoe napravlenie
i uchebnaya distsiplina [Computational linguistics: Scientific
field and academic discipline]. Gomel’: Gomel’
University. 30–34.
- Sitchinava, D. V. 2011. Kompleksnoe issledovanie odnoyazychnogo
i parallel’nogo korpusov v grammaticheskikh
issledovaniyakh [Comprehensive study of monolingual
and parallel corpora in grammatical studies]. Korpusnaya
Lingvistika-2011: Trudy Konferentsii [Corpus-Based
Linguistics-2011 Proceedings]. St. Petersburg. 316–322.
- Sitchinava, D. V., and T.A. Arhangel’skij. 2012. Parallel’nye
belorussko-russkiy i russko-belorusskiy korpusa:
Sovmestnyy proekt Natsional’nogo korpusa
russkogo yazyka [Parallel Belarusian-Russian and
Russian-Belarusian corpora: Joint project of the Russian
National Corpus]. School-Seminar TEL-2012 Proceedings.
Kazan’: Kazan’ University. 54–60.
- Loiseau, S., D. V. Sitchinava, A.A. Zalizniak, and
I.M. Zatsman. 2013. Information technologies for creating
the database of equivalent verbal forms in the
Russian-French multivariant parallel corpus. Informatika
i ee Primeneniya — Inform. Appl. 7(2):100–109.
- Dobrovolsky, D.O., A.A. Kretov, and S. A. Sharoff. 2005.
Korpus parallel’nykh tekstov:Arkhitektura i vozmozhnosti
ispol’zovaniya [Corpus of parallel texts:Architecture and
usage]. Natsional’nyy korpus russkogo yazyka: 2003–2005
[Russian National Corpus 2003–2005]. Moscow: Indrik.
263–296.
- Andreeva, E.G., and V. B. Kasevich. 2005. Grammatika
i leksika (na materiale anglo-russkogo korpusa parallel’nykh
tekstov) [Grammar and lexicon in the English-
Russian corpus of parallel texts]. Natsional’nyy korpus
russkogo yazyka: 2003–2005 [Russian National Corpus
2003–2005]. Moscow: Indrik. 297–307.
- Gak, V.G. 2006. Russkiy yazyk v sopostavlenii s
frantsuzskim [Russian language compared to French].
Moscow: URSS. 264 p.
- Gak, V.G. 2009. Sravnitel’naya tipologiya frantsuzskogo i
russkogo yazykov [Comparative typology of French and
Russian].Moscow: URSS. 288 p.
- Kouznetsova, I.N. 2009. Grammaire contrastive du francais
et du russe.Moscow: Nestor Academic Publs. 272 p.
- Guiraud-Weber, M. 2011. Essais de syntaxe russe et contrastive.
Aix: Universit‚e de Provence. 337 p.
- Goldberg, A. 1995. Constructions: A construction grammar
approach to argument structure. Chicago:Univ. of Chicago
Press. 265 p.
- Goldberg, A. 2006. Constructions at work. The nature of
generalization in language. Oxford: Oxford Univ. Press.
290 p.
- Rakhilina, E.V., ed. 2010. Lingvistika konstruktsiy [Construction
linguistics].Moscow: Azbukovnik. 584 p.
- Zaliznjak, Anna A., and I.B. Levontina. 1996. Otrazhenie
“natsional’nogo kharaktera” v leksike russkogo yazyka
(razmyshleniya po povodu knigi: Anna Wierzbicka.
1992. Semantics, culture, and cognition. Universal human
concepts in culture-specific configurations. — New York,
Oxford: Oxford Univ. Press) [Representation of “national
character” in theRussian lexicon (reflections on the book:
Anna Wierzbicka. 1992. Semantics, culture, and cognition.
Universal human concepts in culture-specific configurations.
New York, Oxford: Oxford Univ. Press]. Russian Linguistics
20:237–264.
- Vinogradov, V. V. 1994. Istoriya slov [History of words].
Moscow: Tolk. 1138 p.
- Plungjan, V.A. 2004. Konstruktsiya s uspet’ i ne uspet’
v russkom yazyke XIX–XX vv.: Korpusnoe issledovanie
[Constructions with “uspet”’ and “ne uspet”’ in Russian
language in XIX–XX centuries: Corpus-based studies].
Russkiy yazyk XIX veka: Problemy izucheniya i leksikograficheskogo
opisaniya [Russian language in XIX century:
Studies and lexicographical description]. St. Petersburg:
Nauka. 112–115.
- Kozerenko, E.B. 2003. Cognitive approach to language
structure segmentation for machine translation algorithms.
MLMTA’03: Conference (International) on Machine
Learning;Models, Technologies and Applications Proceedings.
Las Vegas. 49–55.
- Kozerenko, E.B. 2010. Lingvisticheskie fil’try v statisticheskikh
modelyakh mashinnogo perevoda [Linguistic
filters for statistical machine translation models]. Informatika
i ee Primeneniya — Inform. Appl.] 4(2):83–92.
- Kozerenko, E.B. 2011. Syntactic transformations modelling
for hybrid machine translation. ICAI’11, WORLDCOMP’
11 Proceedings. Las Vegas. 875–881.
[+] About this article
Title
INFORMATION TECHNOLOGIES FOR CORPUS STUDIES:
UNDERPINNINGS FOR CROSS-LINGUISTIC DATABASE CREATION
Journal
Informatics and Applications
2014, Volume 8, Issue 2, pp 98-110
Cover Date
2014-03-31
DOI
10.14357/19922264140210
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
parallel corpus; information technology; cross-linguistic databases; bilingual lexical grammar search;
corpus linguistics; contrastive grammar
Authors
N. V. Buntman , Anna A. Zaliznyak , , I.M. Zatsman , M.G. Kruzhkov , E. Yu. Loshchilova ,
and D.V. Sitchinava
Author Affiliations
Faculty of Foreign Languages and Area Studies, M.V. Lomonosov Moscow State University, 31-a Lomonosov
Str., Moscow 119192, Russian Federation
Institute of Linguistics, Russian Academy of Sciences, 1-1 Bolshyi Kislovskiy pereulok, Moscow 125009, Russian
Federation
Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian
Federation
Institute of Russian Language, Russian Academy of Sciences, 18/2 Volkhonka Str.,Moscow 119019, Russian
Federation
|