Informatics and Applications
2018, Volume 12, Issue 4, pp 96-105
USING SUPRACORPORA DATABASES FOR QUANTITATIVE ANALYSIS OF MACHINE TRANSLATIONS
- N. V. Buntman
- A. A. Goncharov
- I. M. Zatsman
- V. A. Nuriev
Abstract
The paper discusses an information technology that supports expertise of machine translations. The
technology has been developed to meet the following conditions: (i) there are connectives in all translated contexts;
(ii) the connectives may be both one-word (khotya 'although,' a 'and') and multiword (da esche 'and beside
this,' no zato 'but instead'); and (iii) between words making up a given connective, there may be a space (esli
(space) tak 'if (space) then'). With this technology, expertise of machine translations develops through three main
stages: (i) linguistic annotation of machine translations in a supracorpora database; (ii) quantitative processing of
annotations; and (iii) linguistic analysis of annotations and quantitative data. The paper describes technological
aspects of the first two stages. The examples given are only those with multiword connectives. Source sentences
chosen for machine translation have been collected from literary texts.
[+] References (22)
- Moorkens, J., S. Castilho, F Gaspari, and S. Doherty,
eds. 2018. Translation quality assessment: From principles
to practice. Machine translation: Technologies and ap-
plications ser. Cham: Springer International Publishing.
Vol. 1.299 p.
- Scott, B. 2018. Translation, brains and the computer: Aneurolinguistic solution to ambiguity and complexity in machine
translation. Machine translation: Technologies and applications ser. Cham: Springer International Publishing.
Vol. 2. 241 p.
- Popovic. M. 2018. Error classification and analysis for
machine translation quality assessment. Translation quality assessment: From principles to practice. Eds. J. Moorkens,
S. Castilho, F Gaspari, and S. Doherty. Machine translation: Technologies and applications ser. Cham: Springer
International Publishing. 1:129-158.
- Inkova, O. Yu., ed. 2018. Semantika konnektorov: kontrastivnoe issledovanie [Semantics of connectives: A contrastive study]. Moscow: TORUS PRESS. 368 p.
- Kruzhkov, M. G. 2015. Informatsionnye resursy kontrastivnykh lingvisticheskikh issledovaniy: elektronnye
korpusa tekstov [Information resources for contrastive
studies: Electronic text corpora]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(2):140-159.
- Zaliznyak, Anna A., I. M. Zatsman, O. Yu. Inkova, and M.G. Kruzhkov. 2015. Nadkorpusnye bazy dannykh
kak lingvisticheskiy resurs [Supracorpora databases as linguistic resource]. Conference (International) "Corpus linguistics-2015" Proceedings. St. Petersburg: St. Petersburg State University. 211-218.
- Popkova, N.A., O.Yu. Inkova, I. M. Zatsman, and M. G. Kruzhkov. 2015. Metodika postroeniya monoekvivalentsiy v nadkorpusnoy baze dannykh konnektorov [Methodology of constructing monoequivalences in the Supracorpora database of connectors]. Tr. 2-y nauchn. konf. "Zadachi sovremennoy informatiki" [2nd Scientific Conference "Problems of Modern Informatics" Proceedings]. Moscow: FRC CSC RAS. 143-153.
- Zatsman, I.M., O.Yu. Inkova, M.G. Kruzhkov, and N. A. Popkova. 2016. Predstavlenie kross-yazykovykh znaniy o konnektorakh v nadkorpusnykh bazakh dannykh [Representation of cross-lingual knowledge about connectors in Supracorpora databases]. Informatika i ee Primeneniya - Inform. Appl. 10(1):106-118.
- Dobrovol'skiy, D. O., A. A. Kretov, and S. A. Sharov. 2005. Korpus parallel'nykh tekstov: arkhitektura i vozmozh- nosti ispol'zovaniya [Corpus of parallel texts: Architecture and applications]. Natsional'nyy korpus russkogo yazy- ka: 2003-2005 [Russian National Corpus: 2003-2005]. Moscow: Indrik. 263-296.
- Wu, Y., M. Schuster, Z. Chen, et al. 2016. Google’s
neural machine translation system: Bridging the gap between human and machine translation. Available
at: https://arxiv.org/pdf/1609.08144.pdf (accessed
September3,2018).
- Johnson, M., M. Schuster, Q.V. Le, M. Krikun, Y. Wu, Zh. Chen, N. Thorat, F Viegas, M. Wattenberg, G. Corra- do, M. Hughes, and J. Dean. 2017. Google's multilingual neural machine translation system: Enabling zero-shot translation. T. Assoc. Computational Linguistics 5:339- 351.
- Natsional'nyy korpus russkogo yazyka [Russian National Corpus]. Available at: http://www.ruscorpora.ru (accessed November 30, 2018).
- Ulitkin, I. A. 2016. Avtomaticheskaya otsenka kachest- va mashinnogo perevoda nauchno-tekhnicheskogo teksta [Automatic evaluation of machine translation quality of a scientific text]. B.. MRSU. Ser. Linguistics 4:174-182.
- Vilar, D., J. Xu, L. D'haro, and H. Ney. 2006. Error analysis of statistical machine translation output. 5th Conference (International) on Language Resources and Evaluation Proceedings. Genoa, Italy: European Language Resources Association. Available at: http://www.lrec- conf.org/proceedings/lrec2006/pdf/413_pdf.pdf (ac-cessed September 3, 2018).
- Inkova, O. Yu., and M.G. Kruzhkov 2018. Statistical analysis of language specificity of connectives based on parallel texts. Informatika i ee Primeneniya - Inform. Appl. 12(3):83-90.
- Nuriev, V., N. Buntman, and O. Inkova. 2018. Machine translation of Russian connectives into French: Errors and quality failures. Informatika i ee Primeneniya - Inform. Appl. 12(2):105-113.
- Zaliznyak, Anna A., I. M. Zatsman, and O.Yu. Inkova. 2017. Nadkorpusnaya baza dannykh konnektorov: postroenie sistemy terminov [Supracorpora database on connectives: Term system development]. Informatika i ee Primeneniya - Inform. Appl. 11(1):100-106.
- Inkova, O.Yu., and N. À. Popkova. 2017. Statistical data as information source for linguistic analysis of Russian connectors. Informatika i ee Primeneniya - Inform. Appl. 11(3):123-131.
- Zatsman, I.M., M.G. Kruzhkov, and E. Yu. Loshchilo- va. 2017. Metody analiza chastotnosti modeley perevoda konnektorov i obratimost' generalizatsii statisticheskikh dannykh [Methods of frequency analysis of connectives translations and reversibility of statistical data generalization] . Sistemy i Sredstva Informatiki - Systems and Means of Informatics 27(4):164-176.
- Zatsman, I. M. 2018. Stadii tselenapravlennogo izvlecheniya znaniy, implitsirovannykh v parallel'nykh tekstakh [Stages of goal-oriented discovery of knowledge implied in parallel texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 28(3):169-182.
- Durnovo, A. A., I. M. Zatsman, and E. Yu. Loshchilo- va. 2016. Kross-lingvisticheskaya baza dannykh dlya an- notirovaniya logiko-semanticheskikh otnosheniy v tekste [Cross-lingual database for annotating logical-semantic relations in the text]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(4):124-137.
- Zatsman, I. 2018. Goal-oriented creation of individual knowledge: Model and information technology. 19th European Conferenceon Knowledge Management Proceedings. Reading: Academic Publishing International Ltd. 2:947- 956.
[+] About this article
Title
USING SUPRACORPORA DATABASES FOR QUANTITATIVE ANALYSIS OF MACHINE TRANSLATIONS
Journal
Informatics and Applications
2018, Volume 12, Issue 4, pp 96-105
Cover Date
2018-12-30
DOI
10.14357/19922264180414
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
supracorpora database; machine translation; classification of errors; technology supporting expertise; linguistic annotation; corpus linguistics; connectives
Authors
N. V. Buntman , A. A. Goncharov ,
I. M. Zatsman , and V. A. Nuriev
Author Affiliations
M. V. Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow 119991, Russian Federation
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|