Systems and Means of Informatics
2021, Volume 31, Issue 3, pp 101-112
The paper provides an overview of the concept, main structural constituents, and functions of supracorpora databases (SCDB). Supracorpora databases represent a novel type of structured information resources that significantly expand capabilities of linguistic text corpora, parallel corpora in particular. The paper outlines principle features and limitations of parallel corpora and demonstrates how SCDBs allow extending these features and overcoming the limitations. Supracorpora databases allow linguistic experts to establish, record, and annotate translation correspondences between language units in the source and target texts while relying on faceted classification categories composed by the researchers themselves according to their requirements. The article also describes the general structure of SCDB architecture developed in FRC CSC RAS which incorporates corpus and subcorpus constituents that interact with one another as a part of a common database.
Institute of Informatics Problems, Russian Academy of Sciences
corpus linguistics; supracorpora database; parallel corpus; linguistic annotation; information technologies; faceted classification
M. G. Kruzhkov
 Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation