Informatics and Applications

2019, Volume 13, Issue 3, pp 90-96

ARCHITECTURE OF A MACHINE TRANSLATION SYSTEM

V. A. Nuriev

Abstract

The paper describes architecture of a Neural Machine Translation (NMT) system. The subject is brought up since NMT, i. e., translation using artificial neural networks, is now a leading Machine Translation paradigm.
The NMT systems manage to deliver much better quality of output than the machine translators of the previous generation (statistical translation systems) do. Yet, the translation they produce still may contain various errors and it is relatively inaccurate compared with human translations. Therefore, to improve its quality, it is important to see more clearly how an NMT system is built and works. Commonly its architecture consists of two recurrent neural networks, one to get the input text sequence and the other to generate translated output (text sequence). The NMT system often has an attention mechanism helping it cope with long input sequences. As an example, Google's NMT system is taken as the Google Translate service is one of the most highly demanded today, it processes around 143 billion words in more than 100 languages per day. The paper concludes with some perspectives for future research.

[+] References (25)

Bahdanau, D., K. Cho, and Y. Bengio. 2015. Neural machine translation by jointly learning to align and trans-late. Conference (International) on Learning Representations. San Diego, CA. 1-15. Available at: https://arxiv. org/pdf/1409.0473.pdf (accessed June 20, 2019).
Castilho, Sh., J. Moorkens, F Gaspari, I. Calixto, J. Tinsley, and A. Way. 2017. Is neural machine translation the new state of the art? Prague Bull. Math. Linguistics 108(1):109-120.
Zhao, Y., Y. Wang, J. Zhang, and C. Zong. 2017. Cost- aware learning rate for neural machine translation. Chinese computational linguistics and natural language processing based on naturally annotated big data. Eds. M. Sun, X. Wang, B. Chang, and D. Xiong. Cham: Springer International Publishing. 85-93.
Zamora-Martinez, F, and M. J. Castro-Bleda. 2018. Ef-ficient embedded decoding of neural network language models in a machine translation system. Int. J. Neural Syst. 28(9):1850007(1-15).
Hearne, M., and A. Way. 2011. Statistical machine trans-lation: A guide for linguists and translators. Lang. Linguist. Compass 5(5):205-226.
Koehn, Ph. 2010. Statistical machine translation. New York, NY: Cambridge University Press. 447 p.
Choi, H., K. Cho, and Y. Bengio. 2018. Fine-grained at-tention mechanism for neural machine translation. Neu-rocomputing 284:171-176.
Davenport, C. 2018. Google Translate processes 143 billion words every day. Available at: https://www. androidpolice.com/2018/10/09/google-translate- processes-143-billion-words-every-day (accessed June 20, 2019).
Wu, Y., M. Schuster, Z. Chen, et al. 2016. Google'as neural machine translation system: Bridging the gap between human and machine translation. Available at: https: //arxiv.org/pdf/1609.08144.pdf (accessed June 20, 2019).
Johnson, M., M. Schuster, Q.V. Le, et al. 2017. Google's multilingual neural machine translation system: Enabling zero-shot translation. T. Assoc. Computational Linguistics 5:339-351.
Sebastien, J., C. Kyunghyun, R. Memisevic, and Y. Ben-gio. 2015. On using very large target vocabulary for neural machine translation. 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Joint Conference (International) on Natural Language Processing of the Asian Federation of Natural Language Processing Proceedings. Beijing, China: The Association for Computer Linguistics. 1:1-10.
Hochreiter, S., and J. Schmidhuber. 1997. Long shortterm memory. NeuralComput. 9(8):1735-1780.
Gers, FA., J. Schmidhuber, and F. Cummins. 2000. Learning to forget: Continual prediction with LSTM. Neural Comput. 12(10) :2451 -2471.
He, K., X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE. 770-778.
Schuster, M., and K. Nakajima. 2012. Japanese and Korean voice search. IEEE Conference (International) on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE. 5149-5152.
Sutskever, I., O. Vinyals, and Q.V. Le. 2014. Sequence to sequence learning with neural networks. 27th Conference (International) on Neural Information Processing Systems Proceedings. Cambridge, MA: MIT Press. 2:3104-3112.
Luong, M.-T., I. Sutskever, Q.V. Le, O. Vinyals, and W. Zaremba. 2015. Addressing the rare word problem in neural machine translation. 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Joint Conference (International) on Natural Language Processing Proceedings. Beijing, China: Association for Computational Linguistics. 1:11-19.
Hochreiter, S., Y. Bengio, P. Frasconi, and J. Schmidhu- ber. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A field guide to dynamical recurrent networks. Eds. J. F. Kolen, and S. Kremer. Los Alamitos, CA: IEEE Press. 237-243.
Pascanu, R., T. Mikolov, and Y. Bengio. 2013. On the difficulty of training recurrent neural networks. arxiv.org. Available at: https://arxiv.org/pdf/1211.5063v2.pdf (ac-cessed June 20, 2019).
Fahlman, S. E., and C. Lebiere. 1990. The cascade- correlation learning architecture. Advances in neural infor-mation processing systems. Ed. D. S. Touretzky. San Fran-cisco, CA: Morgan Kaufmann Publishers Inc. 2:524-532.
Srivastava, R. K., K. Greff, and J. Schmidhuber. 2015. Highway networks. arxiv.org. Available at: https://arxiv. org/pdf/1507.06228.pdf (accessed June 20, 2019).
Schuster, M., and K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE T. SignalProces. 45(11):2673- 2681.
Dean, J., G. S. Corrado, R. Monga, et al. 2012. Large scale distributed deep networks. Advances in neural information processing systems. Vol. 25. Available at: https://papers.nips.cc/paper/4687-large-scale- distributed-deep-networks.pdf (accessed June 20, 2019).
Kingma, D. P., and J. Ba. 2014. Adam: A method for stochastic optimization. arxiv.org. Available at: https://arxiv.org/ pdf/1412.6980.pdf (accessed June 20, 2019).
Gulcehre, C., O. Firat, K. Xu, K. Cho, and Y. Ben- gio. 2017. On integrating a language model into neural machine translation. Comput. Speech Lang. 45:137-148.

[+] About this article