Systems and Means of Informatics
2025, Volume 35, Issue 1, pp 111-124
INTEGRATION OF A DIGITAL DICTIONARY WITH PARALLEL CORPUS TEXTS: A NEW THEORETICAL APPROACH
- D. O. Dobrovol'skij
- I. M. Zatsman
Abstract
The present paper considers issues of integrating a digital multilingual dictionary (using the example of a German-Russian dictionary) with texts of a parallel corpus within the framework of a lexicographic information system that includes three components: (i) a digital multilingual dictionary; (ii) a corpus as a repository of parallel texts; and (iii) a database of annotated translation correspondences and two knowledge bases. The proposed approach to integration is a synthesis of a number of conceptual procedures, including application of the multilevel structuring principle of dictionary entries, formation of annotated translation correspondences for polysemous words and set phrases along with their translations, and providing links between the digital multilingual dictionary and the repository of parallel texts based on individual meanings of polysemous words and set phrases. Until now, such lexicographic information systems have been developed exclusively for monolingual dictionaries with connecting links by lemmas only. The aim of the paper is to describe the proposed approach to integrating a digital multilingual dictionary with texts of a parallel corpus as a theoretical basis for developing a lexicographic information system.
[+] References (21)
- Geyken, A., F. Wiegand, and K.-M. Wurzner. 2017. On-the-fly generation of dictionary articles for the DWDS website. Electronic Lexicography in the 21st Century: Conference Proceedings. Eds. I. Kosem and C. Tiberius. Brno: Lexical Computing CZ s. r. o. 560-570.
- Digitales Worterbuch der deutschen Sprache. Available at: https://www.dwds.de (accessed March 11, 2025).
- Klein, W., and A. Geyken. 2010. Das digitale Worterbuch der Deutschen Sprache (DWDS). Lexicographica 26(2010):79-96. doi: 10.1515/9783110223231.1.79.
- Didakowski, J., L. Lemnitzer, and A. Geyken. 2012. Automatic example sentence extraction for a contemporary German dictionary. 15th EURALEX Congress (International) Proceedings. Oslo, Norway: University of Oslo. 343-349.
- Politz, C., T. Bartz, K. Morik, and A. Storrer. 2015. Investigation of word senses over time using linguistic corpora. Text, speech, and dialogue. Eds. P. Kral and V. Matousek. Lecture notes in computer science ser. Cham: Springer. 9302:191-198. doi: 10.1007/978-3-319-24033-6-22.
- Kruzhkov, M. G. 2015. Informatsionnye resursy kontrastivnykh lingvisticheskikh issledovaniy: elektronnye korpusa tekstov [Information resources for contrastive studies: Electronic text corpora]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(2): 140-159. doi: 10.14357/08696527150209. EDN: UBFBRF.
- Zatsman, I. M., O. Yu. Inkova, M. G. Kruzhkov, and N. A. Popkova. 2016. Predstavlenie krossyazykovykh znaniy o konnektorakh v nadkorpusnykh bazakh dannykh [Representation of cross-lingual knowledge about connectors in supracorpora databases]. Informatika i ee Primeneniya - Inform. Appl. 10(1): 106{118. doi: 10.14357/19922264160110. EDN: VXDWPP.
- Zatsman, I. M., and M. G. Kruzhkov. 2018. Nadkorpusnaya baza dannykh konnektorov: razvitie sistemy terminov proektirovaniya [Supracorpora database of connectives: Design-oriented evolution of the term system]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 28(4):156{167. doi: 10.14357/08696527180415. EDN: VNHOSJ.
- Goncharov, A. A., I. M. Zatsman, and M. G. Kruzhkov. 2020. Evolyutsiya klassifikatsiy v nadkorpusnykh bazakh dannykh [Evolution of classifications in supracorpora databases]. Informatika i ee Primeneniya - Inform. Appl. 14(4): 108{116. doi: 10.14357/19922264200415. EDN: GKWBZT.
- Kruzhkov, M. 2021. Kontseptsiya postroeniya nadkorpusnykh baz dannykh [Conceptual framework for supracorpora databases]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 31(3):101{112. doi: 10.14357/08696527210309. EDN: UMWNIU.
- Dobrovol'skij, D. O. 2008. Struktura slovarya [Dictionary structure]. Novyy bol'shoy nemetsko-russkiy slovar'. [New large German-Russian dictionary]. Ed. D. O. Dobrovol'skij. Moscow: AST. 1:7-30.
- Goncharov, A. A., I. M. Zatsman, and M. G. Kruzhkov. 2019. Temporal'nye dannye v leksikograficheskikh bazakh znaniy [Temporal data in lexicographic databases]. Informatika i ee Primeneniya - Inform. Appl. 13(4):90{96. doi: 10.14357/ 19922264190415. EDN: SDPAUH.
- Dobrovol'skij, D. O., A. A. Kretov, and S.A. Sharov. 2005. Korpus parallel'nykh tekstov [Corpus of parallel texts]. Nauchno-tekhnicheskaya informatsiya. Ser. 2: Informatsionnye protsessy i sistemy [Scientific and Technical Information. Ser. 2: Information processes and systems] 6:27{42. EDN: PZQPZX.
- Dobrovol'skij, D. O. 2016. Parallel'nyy korpus v issledovanii konstruktsiy (problemy ekvivalentnosti i ee slovarnogo opisaniya) [Parallel corpus in the study of constructions (problems of equivalence and its dictionary description)]. Kontensivnye aspekty yazy- ka: konstantnost' i variativnost' [Contensive aspects of language: Constancy and variability]. Moscow: Flinta. 75-86. EDN: YGYWPN.
- Dobrovol'skiy, D. O. 2020. Korpusnyy podkhod k issledovaniyu frazeologii: novye rezul'taty po dannym parallel'nykh korpusov [Corpus-based approach to phraseology research: New evidence from parallel corpora]. Vestnik Sankt-Peterburgskogo uni- versiteta. Yazyk i literatura [Vestnik of Saint Petersburg University. Language and Literature] 17(3):398-411. doi: 10.21638/spbu09.2020.303. EDN: QZIAAB.
- Zatsman, I. 2024. Building digital spiral models of knowledge generation. 19th Forum (International) on Knowledge Asset Dynamics Proceedings. Matera, Italy: Arts for Business Institute. 2185-2196.
- Dobrovol'skij, D. O., and I. M. Zatsman. 2024. Model' izvlecheniya znaniya iz parallel'nykh tekstov leksikograficheskoy informatsionnoy sistemy [A model for extracting knowledge from parallel texts of a lexicographic information system]. Informatika i ee Primeneniya - Inform. Appl. 18(3):97-105. doi: 10.14357/19922264240312. EDN: NDNPCF.
- Goncharov, A. A., I. M. Zatsman, M. G. Kruzhkov, and E. Yu. Loshchilova. 2021. Otrazhenie evolyutsii leksikograficheskikh znaniy v dinamicheskikh klassifikatsionnykh sistemakh [Capturing evolution of lexicographic knowledge in dynamic classification systems]. Informatika i ee Primeneniya - Inform. Appl. 15(4):41-49. doi: 10.14357/19922264210406. EDN: MGORMY.
- Goncharov, A. A., I. M. Zatsman, and M. G. Kruzhkov. 2021. Predstavlenie novykh leksikograficheskikh znaniy v dinamicheskikh klassifikatsionnykh sistemakh [Representation of new lexicographical knowledge in dynamic classification systems]. Informatika i ee Primeneniya - Inform. Appl. 15(1):86-93. doi: 10.14357/19922264210112. EDN: OPEFXW.
- Ide, N., and J. Pustejovsky, eds. 2017. Handbook of linguistic annotation. Dordrecht, The Netherlands: Springer Science + Business Media. 1568 p. doi: 10.1007/978-94- 024-0881-2.
- Goncharov, A. A., O. Yu. Inkova, and M. Kruzhkov. 2019. Metodologiya annotirovaniya v nadkorpusnykh bazakh dannykh [Annotation methodology of supracorpora databases]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(2): 148-160. doi: 10.14357/08696527190213. EDN: GNDCJE.
[+] About this article
Title
INTEGRATION OF A DIGITAL DICTIONARY WITH PARALLEL CORPUS TEXTS: A NEW THEORETICAL APPROACH
Journal
Systems and Means of Informatics
Volume 35, Issue 1, pp 111-124
Cover Date
2025-04-20
DOI
10.14357/08696527250106
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
lexicographic information system; parallel texts; digital multilingual dictionary; corpus; database of annotated translation correspondences
Authors
D. O. Dobrovol'skij  ,  ,  and I. M. Zatsman
Author Affiliations
 Vinogradov Russian Language Institute of the Russian Academy of Sciences, 18/2 Volkhonka Str., Moscow 119019, Russian Federation
 Institute of Linguistics of the Russian Academy of Sciences, 1 bld. 1 Bolshoy Kislovsky Lane, Moscow 125009, Russian Federation
 Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|