Informatics and Applications
2017, Volume 11, Issue 4, pp 118-125
APPROACHES TO ANNOTATION OF DISCOURSE RELATIONS IN LINGUISTIC CORPORA
Abstract
This paper examines the Supracorpora Database of Connectives (SCDB-Connectives) that is based on data from parallel corpora. The SCDB-Connectives provides structural and semantic annotation of Russian connectives and their translation correspondences in French (and, eventually, in other languages). The SCDB- Connectives annotation approach is compared to the latest developments in the area of annotation of discourse relations - the annotated corpus of discourse relations Penn Discourse Treebank (PDTB) and the proposed standard for annotation of semantic relations ISO 24617-8, some of the important differences are discussed. Penn Discourse Treebank and ISO 24617-8 allow annotating implicit discourse relations as well as explicit ones while SCDB-Connectives only annotates explicit relations, i. e., those expressed by connectives. Furthermore, PDTB and ISO 24617-8 provide a superior framework for annotating text spans for relation arguments, which allows annotating attribution for these arguments, such as source and type of the linked propositions. In addition, ISO 24617-8 specifies argument roles for asymmetrical discourse relations. On the other hand, the principle advantage of the SCDB-Connectives is that it allows annotating both connectives and their translation correspondences in parallel corpora, opening up new possibilities for contrastive studies. The SCDB-Connectives is based on a relational database rather than on the XML format, which helps to manage complex cross-linguistic data efficiently. Benefits of semantic annotation of connectives for both theoretical and practical purposes are also discussed.
[+] References (21)
- Loiseau, S., D.V. Sitchinava, Anna A. Zalizniak, and
I. M. Zatsman. 2013. Information technologies for creating the database of equivalent verbal forms in the Russian-French multivariant parallel corpus. Informatika
i ee Primeneniya - Inform. Appl. 7(2):100-109.
- Kruzhkov, M., N. Buntman, E. Loshchilova, D. Sitchinava, Anna A. Zalizniak, and I. A. Zatsman 2014. Database of Russian verbal forms and their French translation equiv-alents. Computational Linguistics and Intellectual Technolo-gies: Conference (International) "Dialogue 2016" Proceed-ings. Moscow: RGGU. 13(20):275-287.
- Kruzhkov, M. 2016. Supracorpora databases as corpus- based superstructure for manual annotation of parallel corpora. 8th Conference (International) on Corpus Linguistics. EPiC Ser. in Language and Linguistics. 1:236-248. Available at: https://easychair.org/ publications/paper/270289 (accessed August 31, 2017).
- Mikhailov, M., and R. Cooper. 2016. Corpus linguistics for translation and contrastive studies: A guide for research. London - New York: Routledge. 234 p.
- Prasad, R., N. Dinesh, A. Lee, E. Miltsakaki, L. Robal- do, A. Joshi, and B. Webber. 2008. The Penn Discourse TreeBank 2.0. 6th Conference (International) on Language Resources and Evaluation Proceedings. Marrackech, Mo-rocco. 2961-2968.
- Prasad, R., B. Webber, and A. Joshi. 2017. The Penn Discourse Treebank: An annotated corpus ofdiscourse re-lations. Handbook of linguistic annotation. Springer. 1197-1217.
- Asher, N. 1993. Reference to abstract objects. Dordrecht- Boston: Kluwer Academic. 455 p.
- Webber, B. 2016. Concurrent discourse relations. Com-putational Linguistics and Intellectual Technologies: Con-ference (International) "Dialogue 2016" Proceedings. Moscow. 15(22):D. Available at: http://www.dialog- 21.ru/media/3488/webber.pdf (accessed August 31, 2017).
- Prasad, R., B. Webber, and A. Joshi. 2014. Reflections on the Penn Discourse TreeBank, comparable corpora and complementary annotation. Computational Linguistics 40(4):921-950.
- Prasad, R., S. McRoy, N. Frid, A. Joshi, and H. Yu. 2011. The biomedical discourse relation bank. BMC Bioinformatics 12:188-205.
- Zufferey, S., and L. Degand. 2013. Annotating the meaning of discourse connectives in multilingual corpora. Corpus linguistics and linguistic theory. 1-24.
- Cribble, L., and S. Zufferey. 2015. Using a unified taxon-omy to annotate discourse markers in speech and writing. 11th Conference (International) on Computational Semantics Proceedings. London. 14-22.
- Bunt, H., and R. Prasad. 2016. ISO DR-Core (ISO 24617-8): Core concepts for the annotation of discourse relations. 12th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-12Proceedings). Portoroz. 45-54.
- Zatsman, I., O. Inkova, and V. Nuriev. 2017. The con-struction of classification schemes: Methods and tech-nologies of expert formation. Automatic Documentation Mathematical Linguistics 51(1):27-41.
- Zatsman, I. 2012. Tracing emerging meanings by computer: Semiotic framework. 13th European Conference on Knowledge Management Proceedings. Reading: Academic Publishing International Ltd. 2:1298-1307.
- Zatsman, I., N. Buntman, M. Kruzhkov, V. Nuriev, and Anna A. Zalizniak. 2014. Conceptual framework for development of computer technology supporting cross- linguistic knowledge discovery. 15th European Conference on Knowledge Management Proceedings. Reading: Aca-demic Publishing International Ltd. 3:1063-1071.
- Zatsman, I., and N. Buntman. 2015. Outlining goals for discovering new knowledge and computerised tracing of emerging meanings. 16th European Conference on
Knowledge Management Proceedings. Reading: Academic Publishing International Ltd. 851-860.
- Lapshinova-Koltunski, E., and K. Kunz. 2014. Annotating cohesion for multilingual analysis. 10th Joint ACL-ISO Workshop on Interoperable Semantic Annotation Proceed- mgs. Reykjavik. 57-64.
- Meyer, T, A. Popescu-Belis, N. Hajlaoui, and A. Ges- mundo. 2012. Machine translation of labeled discourse connectives. 10th Conference of the Association for Machine Translation in the Americas Proceedings. San Diego, CA. Available at: http://publications.idiap. ch/index.php/publications/show/2391 (accessed August 31, 2017).
- Meyer, T. 2014. Discourse-level features for statistical machine translation. PhD thesis. Ecole Polytechnique Federale de Lausanne. Available at: http://publications. idiap.ch/downloads/ papers/2015/Meyer_THESIS_2014. pdf (accessed August 31, 2017).
- Meyer, T, N. Hajlaoui, and A. Popescu-Belis. 2015. Dis-ambiguating discourse connectives for statistical machine translation. IEEE/ACM Transactions on Audio, Speech and Language Processing 23(7):1184-1197.
[+] About this article
Title
APPROACHES TO ANNOTATION OF DISCOURSE RELATIONS IN LINGUISTIC CORPORA
Journal
Informatics and Applications
2017, Volume 11, Issue 4, pp 118-125
Cover Date
2017-12-30
DOI
10.14357/19922264170415
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
discourse relations; discourse connectives; corpus linguistics; parallel corpora; supracorpora databases
Authors
M. G. Kruzhkov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|