Informatics and Applications

2023, Volume 17, Issue 4, pp 81-87

PARALLEL CORPUS ANNOTATION: APPROACHES AND DIRECTIONS FOR DEVELOPMENT

  • A. A. Goncharov

Abstract

Possible directions for the development of parallel corpus annotation tools are presented considering the actual situation in this area. The main approaches to conducting research on corpus material - (i) corpus-based; (ii) corpus-driven; and (iii) corpus-illustrated - are considered and the differences between them are briefly described. It is demonstrated that despite the abundance of corpus annotation tools, the vast majority of them are designed to deal with monolingual corpora and/or support a very limited functionality for annotating textual data. The largest number of functions are provided by supracorpora databases and web applications to access them which are being developed at FRC CSC RAS: (i) forming of original and translated text blocks necessary and sufficient for analyzing the occurrence of the studied language unit and its translation variant; (ii) identification of the occurrence of the studied language unit and its translation variant; (iii) selection of features characterizing the use of the studied language unit and its translation variant; and (iv) selection of features characterizing the translation correspondence. This set of functions provides solutions to a significant part of research problems but it can be extended. Three directions for the development of the existing functionality are suggested which can provide a more detailed description of linguistic material.

[+] References (21)

[+] About this article