Systems and Means of Informatics
2023, Volume 33, Issue 4, pp 102-114
SEARCH WITH EXCLUSION IN PARALLEL TEXTS
Abstract
The paper examines the method of search with exclusion in parallel texts. The described method is based on the approach that takes a text as an ordered set of wordforms. In the context of this approach, the possibilities of bilingual search by exact form, by lemma, and by morphological features are considered. This approach provides a basis not only for the abovementioned types of search, but also for search with exclusion, i. e., a search allowing to find in parallel texts such pairs of text fragments that contain a certain wordform in language A but do not contain any wordform from a given set in language B. To illustrate this idea, an example of searching fragments with implicit logical-semantic relations in parallel texts stored in a database is given.
If the required wordform in language A marks a logical-semantic relation and the set of wordforms in language B contains the maximum number of variants for the translation of the required wordform into language B, the search can yield the pairs of fragments where the logical-semantic relation is expressed in the fragment in language A by the required wordform but it is implicit in the fragment in language B.
[+] References (10)
- Goncharov, A. A. 2022. Metody poiska implitsitnykh logiko-semanticheskikh otnosheniy v parallel'nykh tekstakh [Methods for retrieval of implicit logical-semantic relations from parallel texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 32(4):32-44. doi: 10.14357/08696527220404.
- Tiedemann, J. 2011. Bitext alignment. Morgan & Claypool Publs. 165 p.
- Zatsman, I., M. Kruzhkov, and E. Loshchilova. 2019. Metody i sredstva informatiki dlya opisaniya struktury neodnoslovnykh konnektorov [Methods and means of informatics for multiword connectives structure description]. Struktura konnekto- rov i metody ee opisaniya [Connectives structure and methods of its description]. Ed. O. Yu. Inkova. Moscow: TORUS PRESS. 205-230. doi: 10.30826/SEMANTICS19-
6. EDN: YVAJWN.
- Kruzhkov, M. 2021. Kontseptsiya postroeniya nadkorpusnykh baz dannykh [Conceptual framework for supracorpora databases]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 31(3):101-112. doi: 10.14357/08696527210309. EDN: UMWNIU.
- Goncharov, A. A., and O.Yu. Inkova. 2019. Metodika poiska implitsitnykh logiko- semanticheskikh otnosheniy v tekste [Methods for identification of implicit logical- semantic relations in texts]. Informatika i ee Primeneniya - Inform. Appl. 13(3):97- 104. doi: 10.14357/19922264190314. EDN: MWGFJW.
- Goncharov, A. A., and O.Yu. Inkova. 2020. Implitsitnye logiko-semanticheskie otnosheniya i metod ikh poiska v parallel'nykh tekstakh [Implicit logical-semantic relations and a method of their identification in parallel texts]. Komp'yuternaya lingvistika i intellektual'nye tekhnologii: Po mat-lam ezhegodnoy Mezhdunar. konf. "Dialog" [Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference (International) "Dialogue"]. Moscow: RSHI. 19(26):310-320. EDN: DBWWEY.
- Goncharov, A. A. 2020. Vzaimodeystvie logiko-semanticheskogo otnosheniya prichiny s drugimi otnosheniyami: opyt opisaniya na osnove kontrastivnogo analiza pokazateley [Interaction of the causal logical-semantic relation with other relations: An attempt of description based on the contrastive analysis of markers]. Mat-ly Mezhdunar. konf. "Prichinnye konstruktsii v yazykakh mira (sinkhroniya, diakhroniya, tipologiya)" [Conference (International) "Causal Constructions in the World's Languages (Synchrony, Diachrony, Typology)" Proceedings]. St.Petersburg: ILI RAS. 84-87. EDN: MXJJSU.
- Goncharov, A. A. 2022. Metody poiska implitsitnykh logiko-semanticheskikh otnosheniy v monoyazychnykh tekstakh [Methods for retrieval of implicit logical-semantic relations from monolingual texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 32(3):92-102. doi: 10.14357/08696527220309. EDN: NUVZGN.
- Poisk s pomoshch'yu operatorov [Search with operators]. Natsional'nyy korpus russkogo yazyka [The Russian National Corpus]. Available at: https://ruscorpora.ru/en/ page/manual-lemmasearch-advanced/ (accessed October 26, 2023).
- Cambridge sketch engine. Using Corpus Query Language (CQL). Available at: https: //www.cambridge.org/sketch/help/userguides/CQL%20Help%201.3.pdf (accessed October 26, 2023).
[+] About this article
Title
SEARCH WITH EXCLUSION IN PARALLEL TEXTS
Journal
Systems and Means of Informatics
Volume 33, Issue 4, pp 102-114
Cover Date
2023-12-10
DOI
10.14357/08696527230410
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
bilingual search; search with exclusion; implicitness; knowledge extraction from texts; parallel texts; logical-semantic relations
Authors
A. A. Goncharov
Author Affiliations
Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|