Informatics and Applications
2021, Volume 15, Issue 2, pp 96-103
EXTRACTING KNOWLEDGE ABOUT MEANS OF EXPRESSION OF LOGICAL-SEMANTIC RELATIONS FROM THE SUPRACORPORA DATABASE
- A. A. Goncharov
- O. Yu. Inkova
Abstract
The goal of this paper is to demonstrate how parallel texts annotated with a supracorpora database (SCDB) can be efficiently used to extract knowledge about alternative means of expression of logical-semantic relations (LSR). The authors review the most prominent discursively annotated corpora (Penn Discourse Treebank, Prague Dependency Treebank, and Rhetorical Structure Theory Discourse Treebank) to support the observation that there is no consensus among the researchers as to which linguistic means are to be considered connectives (i. e., prototypical markers of LSR) and which means are deemed "alternative." The research shows that application of the comparative method while leveraging the capabilities of the SCDB of connectives makes it possible not only to extract new knowledge about LSR markers but also to create thesauri of various means of LSR expression in the languages involved, including the alternative ones. In addition, the SCDB data makes it possible to generate new knowledge on correlations between specific LSRs and unconventional means of LSR expression and calculate frequencies of utilization of these means for the studied languages.
[+] References (18)
- Hobbs, J. R. 1976. A computational approach to discourse analyses. New York, NY: Department of Computer Sci-ence, City College, City University ofNew York. Research Report 76-2.
- Hobbs, J. R. 1978. Why is discourse coherent? Menlo Park, CA: SRI International. SRI Technical Note 176.
- Danlos, L., K. Rysova, M. Rysova, and M. Stede. 2018. Primary and secondary discourse connectives: Definitions and lexicons. Dialogue Discourse 9(1):50-78.
- Chistova, E.V., A. O. Shelmanov, M.V. Kobozeva,
D. B. Pisarevskaya, I.V. Smirnov, and S. Yu. Toldova. 2019. Classification models for RST discourse parsing of texts in Russian. Komp'yuternaya lingvistika i intellektu- al'nye tekhnologii: po mat-lam Mezhdunar. konf. "Dialog" [Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference (International) "Dialogue"]. Moscow: RSHI. 18(25):163-176.
- Taboada, M. 2006. Discourse markers as signals (or not) of rhetorical relations. J. Pragmatics 38(4):567-592.
- Goncharov, A. A., and O. Yu. Inkova. 2020. Implitsitnye logiko-semanticheskie otnosheniya i metod ikh poiska v parallel'nykh tekstakh [Implicit logical-semantic relations and a method of their identification in parallel texts]. Komp'yuternaya lingvistika i intellektual'nye tekhnologii:po mat-lam Mezhdunar. konf. "Dialog" [Computational Lin-guistics and Intellectual Technologies: Papers from the An-nual Conference (International) "Dialogue"]. Moscow: RSHI. 19(26):310-320.
- Zatsman, I.M., O.Yu. Inkova, M. G. Kruzhkov, and N. A. Popkova. 2016. Predstavlenie kross-yazykovykh
znaniy o konnektorakh v nadkorpusnykh bazakh dannykh [Representation of cross-lingual knowledge about connectors in suprocorpora databases]. Informatika i ee Primeneniya - Inform. Appl. 10(1):106-118.
- Zatsman, I., M. Kruzhkov, and E. Loshchilova. 2019. Metody i sredstva informatiki dlya opisaniya struktury neodnoslovnykh konnektorov [Methods and means of informatics for multiword connectives structure descrip-tion]. Struktura konnektorov i metody ee opisaniya [Con-nectives structure and methods of its description]. Ed.
O. Yu. Inkova. Moscow: TORUS PRESS. 205-230.
- Inkova, O., ed. 2021. Semantika konnektorov: kolichestvennye metody opisaniya [Semantics of connectives: Quantita-tive methods of analysis]. Bern/Berlin: Peter Lang. 276 p.
- Prasad, R., A. Joshi, and B. Webber. 2010. Realization of discourse relations by other means: Alternative lexical- izations. 23rd Conference (International) on Computational Linguistics Proceedings. Beijing, China. 1023-1031. Available at: https://www.aclweb.org/anthology/C10- 2118.pdf (accessed June 15, 2021).
- Penn Discourse Treebank Project. Available at: https:// www.seas.upenn.edu/~pdtb/ (accessed May 19, 2021).
- Rysova, M., and K. Rysova. 2014. The centre and periphery of discourse connectives. 28th Pacific Asia Confer-ence on Language, Information and Computing Proceedings. Phuket: Department of Linguistics, Chulalongkorn University. 452-459. Available at: https://www.aclweb.org/ anthology/Y14-1052.pdf (accessed June 15, 2021).
- Ru-RSTreebank: Russkoyazychnyy diskursivnyy korpus [Ru-RSTreebank: Russian discourse corpus]. Available at: https://rstreebank.ru/ (accessed May 19, 2021)
- Webber, B., R. Prasad, A. Lee, and A. Joshi. 2019. The Penn Discourse Treebank 3.0: Annotation manual. Available at: https://catalog.ldc.upenn.edu/docs/ LDC2019T05/PDTB3-Annotation-Manual.pdf (accessed May 19, 2021)
- Das, D., and M. Taboada. 2014. RST signalling corpus: Annotation manual. Available at: https://www. sfu.ca/~mtaboada/docs/ publications/RST_Signalling_ Corpus_Annotation_Manual.pdf (accessed May 19, 2021)
- Johansson, S. 2007. Seeing through multilingual corpora: On the use of corpora in contrastive studies. Amster-dam/Philadelphia: John Benjamins. 377 p.
- Inkova, O. Yu. 2019. Annotirovanie parallel'nykh tekstov: ponyatie "divergentnyy perevod" [Annotation of parallel texts: The concept of divergent translation]. Komp'yuternaya lingvistika i intellektual'nye tekhnologii:po mat-lam Mezhdunar. konf. "Dialog" [Computational Lin-guistics and Intellectual Technologies: Papers from the An-nual Conference (International) "Dialogue"]. Moscow: RSHI. 18(25):227-238.
- Zatsman, I., O. Inkova, and V. Nuriev. 2017. The construction of classification schemes: Methods and technologies of expert formation. Autom. Doc. Math. Linguist. 51(1):27-41.
[+] About this article
Title
EXTRACTING KNOWLEDGE ABOUT MEANS OF EXPRESSION OF LOGICAL-SEMANTIC RELATIONS FROM THE SUPRACORPORA DATABASE
Journal
Informatics and Applications
2021, Volume 15, Issue 2, pp 96-103
Cover Date
2021-06-30
DOI
10.14357/19922264210214
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
supracorpora database; logical-semantic relations; connectives; knowledge generation; parallel texts
Authors
A. A. Goncharov and O. Yu. Inkova
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|