Systems and Means of Informatics
2019, Volume 29, Issue 1, pp 180-193
INFORMATION TRANSFORMATIONS OF PARALLEL TEXTS IN KNOWLEDGE EXTRACTION
- A. A. Goncharov
- I. M. Zatsman
Abstract
The paper examines the task of goal-oriented discovery and filling of lacunas in linguistic typologies considered as forms of knowledge representation.
The process of solving this task includes several repeated stages which collectively form one iteration of the proposed solution to the task of goal-oriented knowledge discovery in parallel texts required to fill the lacunas. Parallel texts as an information resource are transformed in the process of solving this task. The purpose of the paper is to describe the types of information transformations of parallel texts that are used during early stages of the process of knowledge
discovery and filling of lacunas in linguistic typologies. As a part of knowledge discovery, first, the parallel texts are fragmented into objects of interpretation and then, the search for potential sources of knowledge capable to fill the lacunas is performed. This paper considers this fragmentation process as one of the information transformation types of parallel texts.
[+] References (24)
- Durnovo, A. A., I. M. Zatsman, and E.Yu. Loshchilova. 2016. Krosslingvisticheskaya baza dannykh dlya annotirovaniya logiko-semanticheskikh otnosheniy v tekste [Cross-lingual database for annotating logical-semantic relations in the text]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(4): 124-137.
- Zatsman, I. M., O. S. Mamonova, and A. Yu. Shchurova. 2017. Obratimost'
i al'ternativnost' generalizatsii modeley perevoda konnektorov v parallel'nykh tekstakh [Reversibility and alternativeness of generalization of connectives translations models in parallel texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 27 (2): 125-142.
- Zatsman, I. M., M. G. Kruzhkov, and E.Yu. Loshchilova. 2017. Metody analiza chastotnosti modeley perevoda konnektorov i obratimost' generalizatsii statisticheskikh dannykh [Methods of frequency analysis of connectives translations and reversibility of statistical data generalization]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 27 (4): 164-176.
- Kruzhkov, M. G. 2017. Approaches to annotation of discourse relations in linguistic corpora. Informatika i ee primeneniya - Inform. Appl. 11(4):118-125.
- Dobrovol'skiy, D.O., and Anna A. Zaliznyak. 2018. Nemetskie konstruktsii s modal'nymi glagolami i ikh russkie sootvetstviya: proekt nadkorpusnoy bazy dannykh [German constructions with modal verbs and their Russian correlates: A supracorpo- ra database project]. Komp'yuternaya lingvistika i intellektual'nye tekhnologii. Po mat-lam ezhegodnoy Mezhdunar. konf. "Dialog.' [Computational Linguistics and Intellectual Technologies. Papers from the Annual Conference (International) "Dia-logue"]. Moscow: RGGU. 17(24):172-184.
- Zatsman, I. M. 2018. Stadii tselenapravlennogo izvlecheniyaznaniy, implitsirovannykh v parallel'nykh tekstakh [Stages of goal-oriented discovery of knowledge implied in parallel texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 28(3): 175-188.
- Ide, N., and J. Pustejovsky, eds. 2017. Handbook of linguistic annotation. Dordrecht, The Netherlands: Springer Science + Business Media. 1468 p.
- Dobrovol'skiy, D.O., ed. 2019 (in press). Nemetsko-russkiy slovar': aktual'naya leksika [German-Russian dictionary: Actual vocabulary]. Moscow: Leksrus.
- Dobrovol'skiy, D.O., A. A. Kretov, and S. A. Sharov. 2005. Korpus parallel'nykh tekstov [Corpus of parallel texts]. Automatic Documentation Math. Linguistics 6:16-27.
- Loiseau, S., D.V. Sitchinava, Anna A. Zalizniak, and I. M. Zatsman. 2013. Information technologies for creating the database of equivalent verbal forms in the Russian-French multivariant parallel corpus. Informatika i ee Primeneniya - Inform. Appl. 7(2):100-109.
- Sitchinava, D. V. 2015. Parallel'nye teksty v sostave Natsional'nogo korpusa russkogo yazyka: novye napravleniya razvitiya i rezul'taty [Parallel texts within the Russian National Corpus: New directions and results]. Trudy Instituta russkogo yazyka im. V. V. Vinogradova [Proceedings of the V. V. Vinogradov Russian Language Institute]. 6:194-235.
- Varga D., L. Nemeth, P. Halacsy, A. Kornai, V. Tron, and V. Nagy. 2005. Parallel corpora for medium density languages. Conference (International) on Recent Advances in Natural Language Processing Proceedings. Shoumen, Bulgaria: INCOMA Ltd. 590-596.
- Segalovich, I. 2003. A fast morphological algorithm with unknown word guessing induced by a dictionary for a Web Search Engine. Conference (International) on Machine Learning: Models, Technologies and Applications Proceedings. Las Vegas, NV: CSREA Press. 273-280.
- Zobnin, A.I., and G. V. Nosyrev. 2015. Morfologicheskiy analizator MyStem3.0 [Morphological analyzer MyStem 3.0]. Trudy Instituta russkogo yazyka im. V. V. Vinogradova [Proceedings of the V. V. Vinogradov Russian Language Institute] 6:300-310.
- Zaliznyak, Anna A., D. V. Sitchinava, S. Loiseau, M. Kruzhkov, and I. M. Zatsman. 2013. Database of equivalent verbal forms in a Russian-French multivariant parallel corpus. Conference (International) on Artificial Intelligence Proceedings. Las Vegas, NV: CSREA Press. 1:101-107.
- Kruzhkov, M. G., N. V. Buntman, E. Ju. Loshchilova, D. V. Sitchinava, Anna A. Zalizniak, and I. M. Zatsman. 2014. A database of Russian verbal forms and their French translation equivalents. Komp'yuternaya lingvistika i intellektual'nye tekhnologii: po mat-lam ezhegodnoy Mezhdunar. konf. "Dialog" [Computational Linguistics and Intellectual Technologies. Papers from the Annual Conference (International) "Dialogue"]. Moscow: RGGU. 13(20):275-287.
- Buntman, N. V., Anna A. Zaliznyak, I. M. Zatsman, M. G. Kruzhkov, E.Yu. Loshchilova, and D.V. Sichinava. 2014. Informatsionnye tekhnologii kor- pusnykh issledovaniy: printsipy postroeniya krosslingvisticheskikh baz dannykh [Information technologies for corpus studies: Underpinnings for cross-linguistic database creation]. Informatika i ee Primeneniya - Inform. Appl. 8(2):98-110.
- Popkova, N. A., O. Yu. Inkova, I. M. Zatsman, and M. G. Kruzhkov. 2015. Metodika postroeniya monoekvivalentsiy v nadkorpusnoy baze dannykh konnektorov [Methodology of constructing monoequivalences in the supracorpora database of connectors]. Tr. 2-y molodezhnoy nauchn. konf. "Zadachi sovremennoy informatiki" [2nd Scientific Conference "Problems of Modern Informatics" Proceedings]. Moscow: FRC CSC RAS. 143-153.
- Zaliznyak, Anna A., I.M. Zatsman, O.Yu. Inkova, and M. G. Kruzhkov. 2015. Nadkorpusnye bazy dannykh kak lingvisticheskiy resurs [Supracorpora databases as linguistic resource]. Conference (International) "Corpus Linguistics-2015" Proceedings. St. Petersburg: St. Petersburg State University. 211-218.
- Kruzhkov, M. G. 2015. Informatsionnye resursy kontrastivnykh lingvisticheskikh issle- dovaniy: elektronnye korpusa tekstov [Information resources for contrastive studies: Electronic text corpora]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(2): 140-159.
- Inkova, O.Yu., and M. G. Kruzhkov. 2016. Nadkorpusnye russko-frantsuzskie basy dannykh glagol'nykh form konnektorov [Supracorpora databases of Russian and French verbal forms and connectors]. Lingue slave a confronto. Eds. O. Inkova and A. Trovesi. Bergamo: Bergamo University Press. 365-392.
- Zaliznyak, Anna A., and M. G. Kruzhkov. 2016. Baza dannykh bezlichnykh glagol'nykh konstruktsiy russkogo yazyka [Database of Russian impersonal verbal constructions]. Informatika i ee Primeneniya - Inform. Appl. 10(4): 132-141.
- Inkova-Manzotti, O.Yu. 2001. Konnektory protivopostavleniya vo frantsuzskom i russkom yazykakh (sopostavitel'noe issledovanie) [Connectives of opposition in French and Russian (comparative research)]. Moscow: Informelektro. 434 p.
- Zatsman, I. M. 2019. Tselenapravlennoe razvitie sistem lingvisticheskikh znaniy: vyyavlenie i zapolnenie lakun [Goal-oriented development of linguistic knowledge systems: Identifying and filling lacunae]. Informatika i ee Primeneniya - Inform.
Appl. 13(1):91-98.
[+] About this article
Title
INFORMATION TRANSFORMATIONS OF PARALLEL TEXTS IN KNOWLEDGE EXTRACTION
Journal
Systems and Means of Informatics
Volume 29, Issue 1, pp 180-193
Cover Date
2019-03-30
DOI
10.14357/08696527190115
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
discovery of lacunas; filling of lacunas; linguistic typology; knowledge extraction from parallel texts; corpus linguistics; objects of interpretation
Authors
A. A. Goncharov and I. M. Zatsman
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|