Systems and Means of Informatics
2014, Volume 24, Issue 2, pp 131-142
METHOD FOR EXTRACTING SINGLE-WORD TRANSLATION CORRESPONDENCES FROM PARALLEL TEXTS
USING DISTRIBUTIONAL SEMANTICS MODELS
- Yu. I. Morozova
- E. B. Kozerenko
- M. M. Sharnin
Abstract
The paper deals with problems of corpus research of linguistic units.
The task of extracting translation correspondences from a parallel corpus is
defined. An overview of existing approaches to this task is provided. The
paper focuses on the approach to extracting translation correspondences based
on distributional semantics models. The paper describes the theoretical model
developed by the authors as well as its software implementation. A test parallel
corpus of patent texts in French and English was compiled for the purpose of
this research. The paper provides results of an experiment aimed at extracting
single-word translation correspondences from the test parallel corpus.
[+] References (16)
- Brown, P. F., S.A. Della Pietra, V. J. Della Pietra, and R.L. Mercer. 1993. The
mathematics of statistical machine translation: Parameter estimation. Comput. Linguistics 19(2):263-311.
- Och, F. J., and H. Ney. 2003. A systematic comparison of various statistical alignment
models. Comput. Linguistics 29(1):19-51.
- Vogel, S.,H. Ney, and Ch. Tillmann. 1996. HMM-based word alignment in statistical
translation. 16th Conference on Computational Linguistics Proceedings. Stroudsburg,
PA, USA: Association for Computational Linguistics. 2:836-841.
- Kozerenko, E.B. 2010. Lingvisticheskie fil'try v statisticheskikh modelyakh mashinnogo perevoda [Linguistic filters in statistical machine translation systems]. Informatika i
ee Primeneniya - Inform. Appl. 4(2):83-92.
- Masahiko, H., and T. Yamazaki. 1996. High-performance bilingual text alignment
using statistical and dictionary information. 34th Annual Meeting of the Association
for Computational Linguistics Proceedings. Stroudsburg, PA, USA: Association for
Computational Linguistics. 131-138.
- Cendejas, E., G. Barcelo, A. Gelbukh, and G. Sidorov. 2009. Incorporating linguistic
information to statistical word-level alignment. 14th Iberoamerican Conference on
Pattern Recognition Proceedings. Berlin, Germany: Springer. 387-394.
- Lingvisticheskiy enciklopedicheskiy slovar' [Linguistic encyclopedia]. 1990. Ed.
V. N. Jarceva. Moscow: Soviet encyclopedia.
- Sahlgren, M., and J. Karlgren. 2005. Automatic bilingual lexicon acquisition using
random indexing of parallel corpora. J. Natural Language Eng. (Special Issue on
Parallel Texts). 11(3):327-341.
- European patent agency. Available at: http://www.epo.org/index.html (accessed
March 27, 2014).
- Somin, N.V., I. P. Kuznetsov, V.G. Nikolaev, N. S. Solov'eva, and A.G. Mackevich.
2011. Metody ustraneniya neopredelennostey bloka leksiko-morfologicheskogo analiza
pri izvlechenii znaniy iz tekstov estestvennogo jazyka [Methods of resolving ambiguity
of lexical and morphological analysis in systems of knowledge extraction from natural
language texts]. Sistemy i Sredstva Informatiki| Systems and Means of Informatics
21(2):97-115.
- The Bilingual Sentence Aligner software tool. Available at: http://research.
microsoft.com/en-us/downloads/aafd5dcf-4dcc-49b2-8a22-f7055113e656/ (accessed
March 27, 2014).
- The Make Bilingua software tool.Available at: http://shalnov-school.ru/parallels.html
(accessed March 27, 2014).
- The Sketch Engine corpus manager. Available at: http://the.sketchengine.co.uk (ac-
cessed March 27, 2014).
- Schmid, H. 1994. Probabilistic part-of-speech tagging using decision trees. Conference
(International) on New Methods in Language Processing Proceedings.
- Sharoff, S., and J. Nivre. 2011. The proper place of men and machines in language
technology: Processing Russian without any linguistic knowledge. Komp'yuternaya
lingvistika i intellektual'nye tekhnologii: Po materialam Mezhdunar. Konf. "Dialog"
[Computational Linguistics and Intelligent Technology: Conference (International)
"Dialogue"]. Moscow: Publishing House of Russian State University for the Humanities. 10(17):591-604.
- Sharnin, M.M., N.V. Somin, I. P. Kuznecov, Yu. I. Morozova, I.V. Galina, and
E.B. Kozerenko. 2013. Statisticheskie mekhanizmy formirovaniya assotsiativnykh
portretov predmetnykh oblastey na osnove estestvenno-jazykovykh tekstov bol'shikh
ob"emov dlya sistem izvlecheniya znaniy [Statistical mechanisms of subject domains
associative portraits formation on the basis of big natural language texts for the systems
of knowledge extraction]. Informatika i ee primeneniya| Inform. Appl. 7(2):92-99.
[+] About this article
Title
METHOD FOR EXTRACTING SINGLE-WORD TRANSLATION CORRESPONDENCES FROM PARALLEL TEXTS
USING DISTRIBUTIONAL SEMANTICS MODELS
Journal
Systems and Means of Informatics
Volume 24, Issue 2, pp 131-142
Cover Date
2013-11-30
DOI
10.14357/08696527140209
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
extracting translation correspondences; alignment; parallel texts;
parallel corpus; distributional semantics; vector space model
Authors
Yu. I. Morozova , E.B. Kozerenko , and M.M. Sharnin
Author Affiliations
Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str.,
Moscow 119333, Russian Federation
|