Systems and Means of Informatics
2020, Volume 30, Issue 4, pp 124-137
MACHINE TRANSLATION: INDICATOR-BASED EVALUATION OF TRAINING PROGRESS IN NEURAL PROCESSING
- A. Yu. Egorova
- I. M. Zatsman
- M. G. Kruzhkov
- V. A. Nuriev
Abstract
The paper presents data collected while observing training progress of a neural machine translation (NMT) engine. The observed training progress received qualitative evaluation based on a set of indicators. Two hundred and fifty text fragments in Russian were used as experimental material for the study. For the duration of one year, every month these fragments were translated into French using the publicly available Google's NMT engine. The produced translations were recorded and annotated by language experts in a supracorpora database which resulted in a series of 12 annotated translations for each of the 250 Russian fragments. The annotations include labels of translation errors which enables researchers to determine the NMT instability types according to the changes of translation quality or lack thereof. The goal of this paper is to describe the newly developed indicator-based approach and to provide an example of its application to evaluation of a neural network training progress.
[+] References (20)
- Fenstermacher, K.D. 2005. The tyranny of tacit knowledge: What artificial intelligence tells us about knowledge representation. 38th Annual Hawaii Conference (International) on System Sciences Proceedings. Washington, D.C.: IEEE Computer Society. 8:243a. doi: 10.1109/HICSS.2005.620.
- Sako, M. 2020. Artificial intelligence and the future of professional work. Commun. ACM 63(4):25-27.
- Denning, P. J., andD. E. Denning. 2020. Dilemmas of artificial intelligence. Commun. ACM 63(3):22-24.
- Inkova-Manzotti, O.Yu. 2001. Konnektory protivopostavleniya vo frantsuzskom i russkom yazykakh. Sopostavitel'noe issledovanie [Connectors of opposition in French and Russian: A comparative study]. Moscow: Informelektro. 429 p.
- Zaliznyak, A. A., I. M. Zatsman, O. Yu. Inkova, and M. G. Kruzhkov. 2015. Nadkorpusnye bazy dannykh kak lingvisticheskiy resurs [Supracorpora databases as linguistic resource]. Tr. VII Mezhdunar. Konf. "Korpusnaya lingvistika" [7th Conference (International) "Corpus Linguistics" Proceedings]. St. Petersburg: SPbSU. 211-218.
- Durnovo, A. A., I. M. Zatsman, and E.Yu. Loshchilova. 2016. Krosslingvisticheskaya baza dannykh dlya annotirovaniya logiko-semanticheskikh otnosheniy v tekste [Cross-lingual database for annotating logical-semantic relations in the text]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(4): 124-137.
- Zaliznyak, A. A., I. M. Zatsman, and O.Yu. Inkova. 2017. Nadkorpusnaya baza dannykh konnektorov: postroenie sistemy terminov [Supracorpora database on connectives: Term system development]. Informatika i ee Primeneniya - Inform. Appl. 11(1): 100-108.
- Zatsman, I. M., and M. G. Kruzhkov. 2018. Nadkorpusnaya baza dannykh konnektorov: razvitie sistemy terminov proektirovaniya [Supracorpora database of connectives: Design-oriented evolution of the term system]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 28(4): 15 6-167.
- Buntman, N. V., A. A. Goncharov, I. M. Zatsman, and V. A. Nuriev. 2018. Kolichestvennyy analiz rezul'tatov mashinnogo perevoda s ispol'zovaniem nadkorpusnykh baz dannykh [Using supracorpora databases for quantitative analysis of machine translations]. Informatika i ee Primeneniya - Inform. Appl. 12(4):96-105.
- Egorova, A. Yu., I. M. Zatsman, and O. S. Mamonova. 2019. Nadkorpusnye bazy dannykh v lingvisticheskikh proektakh [Supracorpora databases in linguistic projects]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(3):77-91.
- Natsional'nyy korpus russkogo yazyka [Russian National Corpus]. Available at: http://www.ruscorpora.ru/ (accessed September 11, 2020).
- Egorova, A. Yu., I. M. Zatsman, V. V. Kosarik, and V. A. Nuriev. 2020. Nestabil'nost' neyronnogo mashinnogo perevoda [Instability of neural machine translation]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 30(2):124-135.
- Egorova, A. Yu., I. M. Zatsman, M. G. Kruzhkov, and V. A. Nuriev. 2020. Metodika temporal'noy otsenki nestabil'nosti mashinnogo perevoda [The technique allowing for temporal estimation of the machine translation instability]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 30(3):67-80.
- Goncharov, A. A., N. V. Buntman, and V.A. Nuriev. 2019. Oshibki v mashinnom perevode: problemy klassifikatsii [Machine translation errors: Problems of classification]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(3):92-103.
- Inkova, O. Yu. 2018. Nadkorpusnaya baza dannykh kak instrument formal'noy varia- tivnosti konnektorov [Supracorpora database as an instrument of the study of the formal variability of connectives]. Komp'yuternaya lingvistika i intellektual'nye tekhnologii:
po mat-lam ezhegodnoy Mezhdunar. konf. "Dialog" [Computer Linguistic and Intellectual Technologies: Conference (International) "Dialog" Proceedings]. Moscow. 17(24):240-253.
- Dobrovol'skiy, D.O., A. A. Kretov, and S. A. Sharov. 2005. Korpus parallel'nykh tekstov: arkhitektura i vozmozhnosti ispol'zovaniya [Corpus of parallel texts: Architecture and applications]. Natsional'nyy korpus russkogo yazyka: 2003-2005 [Russian National Corpus: 2003-2005]. Moscow: Indrik. 263-296.
- Brynjolfsson, E., and T. Mitchell. 2017. What can machine learning do? Workforce implications. Science 358(6370):1530-1534.
- Bowker, L., and J.B. Ciro. 2019. Machine translation and global research: Towards improved machine translation literacy in the scholarly community. Bingley, U.K.: Emerald Publishing. 111 p.
- Burchardt, A., A. Lommel, and V. Macketanz. 2020. A new deal for translation quality. Universal Access Inf. doi: 10.1007/s10209-020-00736-5.
- Koehn, Ph. 2020. Neural machine translation. New York, NY: Cambridge University Press. 408 p.
[+] About this article
Title
MACHINE TRANSLATION: INDICATOR-BASED EVALUATION OF TRAINING PROGRESS IN NEURAL PROCESSING
Journal
Systems and Means of Informatics
Volume 30, Issue 4, pp 124-137
Cover Date
2020-12-10
DOI
10.14357/08696527200412
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
neural machine translation; instability of machine translation; indicator-based evaluation; linguistic annotation; instability types
Authors
A. Yu. Egorova , I. M. Zatsman , M. G. Kruzhkov , and V. A. Nuriev
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|