Informatics and Applications
2021, Volume 15, Issue 2, pp 104-111
METHODS OF QUALITY ESTIMATION FOR MACHINE TRANSLATION: STATE-OF-THE-ART
- V. A. Nuriev
- A. Yu. Egorova
Abstract
The paper reviews the state-of-the-art methods of quality estimation for machine translation. These methods are grounded in two general approaches: automatic and manual. The automatic assessment builds on the data from comparison of the machine translation system output against the human-generated reference translation.
The manual (human) evaluation primarily takes into account pragmatic and functional aspects: the translation quality is assessed bearing in mind how well the system output is suited to fulfill the translation tasks. The first part presents some automatic metrics for evaluation of machine translation quality. Also, it speaks about both shortcomings of such metrics and new trends in their development. The other part of the paper is focused on human evaluation of machine translation. It describes: (i) evaluation of adequacy and fluency; (ii) ranking of translations; (iii) direct assessment; (iv) computation of the human translation edit rate, and (v) translation annotation involving an error typology.
[+] References (29)
- Larsonneur, C. 2021. Neural machine translation: From commodity to commons? When translation goes digital: Case studies and critical reflections. Eds. R. Desjardins,
C. Larsonneur, and P. Lacour. Cham: Palgrave Macmillan. 257-280.
- Davenport, C. 2018. Google Translate processes 143 billion words every day. Android Police. Available at: https: //www.androidpolice.com/2018/10/09/google- translate-processes-143-billion-words-every-day/ (accessed May 5, 2021).
- Moorkens, J., S. Castilho, F. Gaspari, and S. Doherty, eds. 2018. Translation quality assessment: From principles to practice. Machine translation: Technologies and ap-plications ser. Cham: Springer International Publishing. Vol. 1.299 p.
- Specia, L., C. Scarton, and G.H. Paetzold. 2018. Quality estimation for machine translation. Synthesis lectures on human language technologies ser. London: Morgan & Claypool Publs. 162 p.
- Bittner, H. 2020. Evaluating the evaluator: A novel per-spective on translation quality assessment. New York, NY Routledge. 282 p.
- Papineni, K., S. Roukos, T. Ward, and W.J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. 40th Annual Meeting on Association for Com-putational Linguistics Proceedings. Philadelphia, PA: As-sociation for Computational Linguistics. 311-318.
- Rychikhin, A. K. 2019. O metodakh otsenki kachestva mashinnogo perevoda [On methods of machine translation quality assessment]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(4):106-118.
- Kozina, A. V., E.A. Cherepkov, and Yu. S. Belov. 2019. Avtomaticheskie metriki otsenki kachestva mashinnogo
perevoda [Automatic metrics for machine translation eval-uation]. Sistemnyy administrator [System Administrator] 11:84-87.
- Banerjee, S., and A. Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics Proceedings. Ann Arbor, MI: Association of Computational Linguistics. 65-72.
- Koehn, Ph. 2020. Neural machine translation. New York, NY: Cambridge University Press. 394 p.
- Popovic, M. 2015. chrF: Character n-gram F-score for automatic MT evaluation. 10th Workshop on Statistical Machine Translation Proceedings. Lisboa, Portugal: Asso-ciation for Computational Linguistics. 392-395.
- Popovic, M. 2016. chrF deconstructed: â parameters and n-gram weights. 1st Conference on Machine Translation Proceedings. Berlin, Germany: Association for Computa-tional Linguistics. 2:499-504.
- Chi-kiu, Lo. 2017. MEANT 2.0: Accurate semantic MT evaluation for any output language. Conference on Machine Translation Proceedings. Copenhagen, Denmark: Associ-ation for Computational Linguistics. 2:589-597.
- Stanojevic, M., and K. Sima'an. 2014. BEER: BEtter evaluation as ranking. 9th Workshop on Statistical Machine Translation Proceedings. Baltimore, MD: Association for Computational Linguistics. 414-419.
- Stanojevic, M., andK. Sima'an. 2015. EvaluatingMTsys- tems with BEER. Prague Bulletin Mathematical Linguistics 104:17-26.
- Sellam, T, D. Das, and A. P. Parikh. 2020. BLEURT Learning robust metrics for text generation. Available at: https://arxiv.org/pdf/2004.04696.pdf (accessed May 5, 2021).
- Inkova, O. Yu. 2018. Nadkorpusnaya baza dannykh kak instrument formal'noy variativnosti konnektorov [Supracorpora database as an instrument of the study of the formal variability of connectives]. Komp'yuternaya lingvi- stika i intellektual'nye tekhnologii: po mat-lam ezhegodnoy Mezhdunar. konf. "Dialog" [Computer Linguistic and In-tellectual Technologies: Conference (International) "Di-alog" Proceedings]. Moscow. 17(24):240-253.
- Castilho, Sh., S. Doherty, F Gaspari, and J. Moorkens. 2018. Approaches to human and machine translation quality assessment. Translation quality assessment: From principles to practice. Eds. J. Moorkens, Sh. Castilho, F Gaspari, and S. Doherty. Cham: Springer. 9-38.
- Likert, R. 1932. A technique for the measurement of attitudes. Arch. Psychol. 140:1-55.
- Fleiss, J. L. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5):378-382.
- Shterionov, D., R. Superbo, P. Nagle, etal. 2018. Human versus automatic quality evaluation ofNMT and PBSMT. Machine Translation 32:217-235.
- Castilho, S., J. Moorkens, F Gaspari, et al. 2018. Evalu-ating MT for massive open online courses. A multifaceted comparisonbetween PBSMT andNMT systems. Machine Translation 32:255-278.
- Popovic, M. 2018. Error classification and analysis for machine translation quality assessment. Translation quality assessment: From principles to practice. Eds. J. Moorkens, Sh. Castilho, F Gaspari, and S. Doherty. Cham: Springer. 129-158.
- Lommel, A. 2018. Metrics for translation quality assess-ment: A case for standardising error typologies. Trans-lation quality assessment: From principles to practice. Eds. J. Moorkens, Sh. Castilho, F. Gaspari, and S. Doherty Cham: Springer. 109-127.
- Klubicka, F., A. Toral, and V. M. Sanchez-Cartagena.
2018. Quantitative fine-grained human evaluation of ma-chine translation systems: A case study on English to Croatian. Machine Translation 32:195-215.
- Haque, R., M. Hasanuzzaman, and A. Way. 2020. Analysing terminology translation errors in statistical and neural machine translation. Machine Translation 34:149-195.
- Vilar, D., J. Xu, L. D'Haro, and H. Ney. 2006. Error analysis of statistical machine translation output. 5th Conference (International) on Language Resources and Evaluation Pro-ceedings. 697-702.
- Goncharov, A. A., N. V. Buntman, and V. A. Nuriev. 2019. Oshibki v mashinnom perevode: problemy klassifikatsii [Machine translation errors: Problems of classification]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(3):92-103.
- Calixto, I., and Q. Liu. 2019. An error analysis for image- based multi-modal neural machine translation. Machine Translation 33:155-177.
[+] About this article
Title
METHODS OF QUALITY ESTIMATION FOR MACHINE TRANSLATION: STATE-OF-THE-ART
Journal
Informatics and Applications
2021, Volume 15, Issue 2, pp 104-111
Cover Date
2021-06-30
DOI
10.14357/19922264210215
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
machine translation; translation quality; evaluation of machine translation quality; automatic metrics; direct assessment; typology of machine translation errors
Authors
V. A. Nuriev and A. Yu. Egorova
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|