Informatics and Applications

2017, Volume 11, Issue 4, pp 118-125

APPROACHES TO ANNOTATION OF DISCOURSE RELATIONS IN LINGUISTIC CORPORA

  • M. G. Kruzhkov

Abstract

This paper examines the Supracorpora Database of Connectives (SCDB-Connectives) that is based on data from parallel corpora. The SCDB-Connectives provides structural and semantic annotation of Russian connectives and their translation correspondences in French (and, eventually, in other languages). The SCDB- Connectives annotation approach is compared to the latest developments in the area of annotation of discourse relations - the annotated corpus of discourse relations Penn Discourse Treebank (PDTB) and the proposed standard for annotation of semantic relations ISO 24617-8, some of the important differences are discussed. Penn Discourse Treebank and ISO 24617-8 allow annotating implicit discourse relations as well as explicit ones while SCDB-Connectives only annotates explicit relations, i. e., those expressed by connectives. Furthermore, PDTB and ISO 24617-8 provide a superior framework for annotating text spans for relation arguments, which allows annotating attribution for these arguments, such as source and type of the linked propositions. In addition, ISO 24617-8 specifies argument roles for asymmetrical discourse relations. On the other hand, the principle advantage of the SCDB-Connectives is that it allows annotating both connectives and their translation correspondences in parallel corpora, opening up new possibilities for contrastive studies. The SCDB-Connectives is based on a relational database rather than on the XML format, which helps to manage complex cross-linguistic data efficiently. Benefits of semantic annotation of connectives for both theoretical and practical purposes are also discussed.

[+] References (21)

[+] About this article