Informatics and Applications
2022, Volume 16, Issue 2, pp 52-59
PRINCIPLES OF DESCRIBING MARKERS OF LOGICAL-SEMANTIC RELATIONS AND THEIR HIERARCHY
- A. A. Durnovo
- O. Yu. Inkova
- N. A. Popkova
Abstract
The article deals with annotation strategies in corpora with discourse markup. It is shown that Rhetorical Structure Theory (RST)-based corpora only contain annotations of coherence relations, or rhetorical relations (RR). In contrast, the Penn Discourse Treebank (PDTB) of the University of Pennsylvania annotates relations markers, as does the Supracorpora Database of Connectives. The RST Signaling Corpus (RST-SC), also based on RST, has been shown to annotate RR markers, but cannot combine the markup of RRs and their markers in a single annotation. This problem is solved by the GUM corpus and the Supracorpora Database of Hierarchy of Logical-Semantic Relations. The latter has a few advantages: the ability to search, to obtain statistics, and to form bilingual annotations. This makes it possible to identify both universal phenomena in the discursive organization of the text and language-specific phenomena.
[+] References (24)
- Goncharov, A.A., and O.Yu. Inkova. 2021. Izvlechenie znaniy o sredstvakh vyrazheniya logiko-semanticheskikh otnosheniy pri pomoshchi Nadkorpusnoy bazy dannykh [Extracting knowledge about means of expression of logical-semantic relations from the Supracorpora database]. Informatika i ee Primeneniya - Inform. Appl. (15)2:96-103.
- Das, D., and M. Taboada. 2014. RST Signalling Corpus annotation manual. Available at: https://www.sfu.ca/ ~mtaboada/docs/ publications/RST_Signalling_ Corpus_Annotation_Manual.pdf (accessed April 22, 2022).
- Penn Discourse Treebank Project (PDTB). Available at: https://www.seas.upenn.edu/~pdtb/ (accessed April 22, 2022).
- Ru-RSTreebank: Russkoyazychnyy diskursivnyy korpus
[Ru-RSTreebank: Russian discourse corpus]. Available at:
https://rstreebank.ru/ (accessed April 22, 2022).
- Carlson, L., and D. Marcu. 2001. Discourse tagging reference manual. 87 p. Available at: ftp://128.9.176.20/isipubs/tr-545.pdf (accessed April 22, 2022).
- Carlson, L., D. Marcu, and M. E. Okurowski. 2003. Building a discourse-tagged corpus in the framework of rhetorical structure theory. Current directions in discourse and
dialogue. Eds J. van Kuppevelt and R. Smith. Dordrecht: Kluwer Academic Publs. 85-109.
- Mann, W. C., and S. A. Thompson. 1988. Rhetorical struc-ture theory: Towards a functional theory of text organiza-tion. Text 8(3):243-281.doi: 10.1515/text.1.1988.8.3.243.
- Ru-RSTreebank. 2019. Rukovodstvo po razmetke tekstov (na osnove teorii ritoricheskikh struktur) [Text Markup Guide (based on the theory of rhetorical structures)]. Available at: https://docs.google.com/ document/d/lwd-sgGylo5AQq2IPj6jWa_QmU0fUohXj 48qsfVDgcBs/edit#heading=h.gjdgxs (accessed
April 22, 2022).
- Kibrik, A. A., and V. M. Podlesskaya, eds. 2009. Rasskazy
o snovideniyakh: Korpusnoe issledovanie ustnogo russkogo diskursa [Night dream stories. A corpus study of spoken Russian discourse]. Moscow: LRC Publishing House. 736 p.
- Inkova, O. Yu. 2019. Logiko-semanticheskie otnosheniya: Problemy klassifikatsii [Logical-semantic relations: Classification problems]. Svyaznost' teksta: mereologicheskie logiko-semanticheskie otnosheniya [Text coherence: Mere- ological logical semantic relations]. Moscow: LRC Pub-lishing House. 11-98.
- Prasad, R., E. Miltsakaki, N. Dinesh, A. Lee, A. Joshi, and B. L. Webber. 2006. The Penn Discourse Treebank 1.0 Annotation Manual. Philadelphia, PA: Institute for Re-search in Cognitive Science, University of Pennsylvania. Technical Report No. IRCS-06-01. Available at: https: //repository, u pen n .ed u/i rcs_reports/3/ (accessed April 22, 2022).
- Prasad, R., B. Webber, A. Lee, and A. Joshi. 2019. The Penn Discourse Treebank 3.0 Annotation Manual. Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania. 81 p. Available at: https://catalog. ldc.upenn.edu/docs/LDC2019T05/PDTB3- Annotation-Manual.pdf (accessed April22, 2022).
- Prasad, R., A. Joshi, and B. Webber. 2010. Realization of discourse relations by other means: Alternative lexicalizations. 23rd Conference (International) on Computational Linguistics Proceedings: Posters Volume. Beijing. 1023-1031. Available at: https://www.aclweb. org/anthology/C10-2118.pdf (accessed April 22, 2022).
- Prasad, R., E. Miltsakaki, N. Dinesh, A. Lee, and A. Joshi. 2008. The Penn Discourse Treebank 2.0 Annotation Manual. Philadelphia, PA: Institute for Research in Cognitive Science, University of Pennsylvania. Technical Report IRCS-08-01. Available at: https://repository.upenn. edu/cgi/viewcontent.cgi?article=1203&context=ircs_ reports. (accessed June 9, 2022).
- Inkova-Manzotti, O. Yu. 2001. Konnektory protivopostavleniya vo frantsuzskom i russkom yazykakh. Sopostavitel'noe issledovanie [Connectives of opposition in French and Russian. A comparative study]. Moscow: In- formelektro. 432 p.
- Penn Discourse Treebank Project (PDTB). Available at: https:// www.seas.upenn.edu/~pdtb/ (accessed April 22, 2022).
- Natsional'nyy korpus russkogo yazyka (NKRYa) [Russian National Corpus (RNC)]. Available at: http://www.ruscorpora.ru (accessed April 22, 2022).
- Inkova, O., and N. Popkova. 2017. Statistical data as information source for linguistic analysis of Russian connectors. Informatika i ee Primeneniya - Inform. Appl. 11(3):123-131.
- Inkova, O. Yu. 2018. Lingvospetsifichnost' konnektorov: metody i parametry opisaniya [Language specificity of connectives methods and parameters of description]. Se- mantika konnektorov: kontrastivnoe issledovanie [Seman-tics of connectives: Acontrastive study]. Ed. O. Yu. Inkova. Moscow: TORUS PRESS. 5-23.
- Das, D., and M. Taboada. 2018. RST signalling corpus: A corpus of signals of coherence relations. Lang. Resour. Eval. 52:149-184.
- Zeldes, A. 2016. rstWeb - a browser-based annotation interface for rhetorical structure theory and discourse relations. NAACL-HLTProceedings. San Diego, CA: As-sociation for Computational Linguistics. 1-5.
- Gessler, L., J. Liu, and A. Zeldes. 2019. A discourse signal annotation system for RST trees. Discourse Relation Parsing and Treebanking Proceedings. Minneapolis, MN: Association for Computational Linguistics. 56-61. doi: 10.18653/v1/W19-2708.
- GUM: The Georgetown University Multilayer Corpus. Available at: https://corpling.uis.georgetown. edu/gum/annotations.html (accessed April 22, 2022).
- Durnovo, A.A., O. Yu. Inkova, and N.A. Popkova. 2022. Arkhitektura bazy dannykh iyerarkhii logiko- semanticheskikh otnosheniy [Database of hierarchies of logical-semantic relations: Architecture]. Sistemy i Sred- stva Informatiki - Systems and Means of Informatics (32)1:114-125.
[+] About this article
Title
PRINCIPLES OF DESCRIBING MARKERS OF LOGICAL-SEMANTIC RELATIONS AND THEIR HIERARCHY
Journal
Informatics and Applications
2022, Volume 16, Issue 2, pp 52-59
Cover Date
2022-07-25
DOI
10.14357/19922264220207
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
supracorpora database; corpus of texts' annotation; discourse relations; connective
Authors
A. A. Durnovo , O. Yu. Inkova , , and N. A. Popkova
Author Affiliations
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
University of Geneva, 22 Bd des Philosophes, CH-1205 Geneva 4, Switzerland
|