Systems and Means of Informatics
2015, Volume 25, Issue 2, pp 140-159
This article presents information resources used in contrastive linguistic studies and their principle features. There are two main types of such
information resources: typological databases and electronic text corpora. This
paper is focused on the latter. There are two types of corpora, which are
particularly relevant for contrastive studies: comparable corpora - balanced
collections of original texts in the languages compared and parallel (translation)
corpora - collections of original texts in one of the compared languages aligned
with their translations into other compared language(s). In addition to description of the existing information resources of contrastive linguistic studies, this
paper introduces a new type of such resources, which are termed here as corpus
extension databases. The article outlines features of such databases in comparison
to electronic corpora and justifies the necessity for creating them.
Systems and Means of Informatics
Volume 25, Issue 2, pp 140-159
Institute of Informatics Problems, Russian Academy of Sciences
Key words
linguistic studies; databases; typological databases; comparable corpora; parallel corpora; corpus extension databases
M.G. Kruzhkov
Author Affiliations
 Institute of Informatics Problems, Federal Research Center "Computer Science
and Control," Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation