Informatics and Applications
2017, Volume 11, Issue 1, pp 100-108
SUPRACORPORA DATABASE ON CONNECTIVES: TERM SYSTEM DEVELOPMENT
- Anna A. Zaliznyak
- I. M. Zatsman
- O. Yu. Inkova
Abstract
The article considers a supracorpora database (SCDB) - a new type of linguistic information resource.
The SCDB contains aligned parallel texts wherein source language sentences are aligned with target language sentences. One distinctive feature of the SCDB is that it supports annotating the examined linguistic items (in this case, connectives). Another important feature is that cross-linguistic annotating makes it possible to reveal a wide spectrum of new entities and concepts, both in informatics and linguistics. For description of these entities and concepts, a new multidisciplinary term system is proposed. On the one hand, the proposed terms are used by linguists for description of new basic knowledge generated as a result of contrastive analysis of Russian connectives.
On the other hand, the design of architecture and functional subsystems of the SCDB is based on these terms, and they are used for the development of respective information, linguistic and software tools. Finally, the term system is required for comparison of the presented outcomes of the project with similar results of other projects.
[+] References (18)
- Baranov, A. N., V. A. Plungyan, and E. V. Rakhilina. 1993. Putevoditel' po diskursivnym slovam russkogo yazyka [Guide to the Russian discourse words]. Ìoscow: Po- movskiy i Partnery. 207 p.
- Kiseleva, K., and D. Paillard, eds. 1998. Diskursivnye slova russkogo yazyka. Opyt kontekstno-semanticheskogo opisaniya [Russian discourse words: A contextual- semantic description]. Ìoscow: Metatext. 446 p.
- Inkova-Manzotti, O. Yu. 2001. Konnektory protivopostavleniya vo frantsuzskom i russkom yazykakh: So- postavitel'noe issledovanie [Connectives of opposition in French and Russian: A comparative study]. Moscow: In- formelektro. 434 p.
- Zaliznyak, Anna A. 2016. Baza dannykh mezhyazykovykh ekvivalentsiy kak instrument lingvisticheskogo ana- liza [Database of cross-linguistic equivalences as a tool for linguistic analysis]. Computer Linguistics and Intellectual Technologies: Conference (International) "Dialog" Proceedings. Moscow: RGGU. 763-775.
- Kruzhkov, M.G. 2015. Informatsionnye resursy kontrastivnykh lingvisticheskikh issledovaniy: Elektronnye korpusa tekstov [Information resources for contrastive studies: Digital text corpora]. Sistemy i Sredstva Informati- ki - Systems and Means of Informatics 25(2):140-159.
- Zaliznyak, Anna A., I. M. Zatsman, O.Yu. Inkova, and M.G. Kruzhkov 2015. Nadkorpusnye bazy dannykh kak lingvisticheskiy resurs [Supracorpora databases as linguistic resource]. 7th Conference (International) on Corpus Linguistics Proceedings. St. Petersburg: SPbGU. 211-218.
- Dobrovol'skiy, D. O., A. A. Kretov, and S. A. Sharov. 2005. Korpus parallel'nykh tekstov: Arkhitektura i vozmozh- nosti ispol'zovaniya [Corpus of parallel texts: Architecture and applications]. Natsional'nyy korpus russkogo yazy- ka: 2003-2005 [Russian National Corpus: 2003-2005]. Moscow: Indrik. 263-296.
- Loiseau, S., D.V. Sitchinava, Anna A. Zalizniak, and I. M. Zatsman. 2013. Information technologies for creating the database of equivalent verbal forms in the Russian-French multivariant parallel corpus. Informatika i ee Primeneniya - Inform. Appl. 7(2):100-109.
- Sitchinava, D.V. 2014. Ispol'zovanie parallel'nogo kor- pusa dlya kolichestvennogo izucheniya lingvospetsifich- noy leksiki [Using a parallel corpus for the quantitative study of language-specific units]. Yazyk, literatura, kul'tura: Aktual'nye problemy izucheniya i prepodavaniya [Language, literature, culture: Actual problems of research and teaching]. Moscow: MAKS PRESS. 10:37-44.
- Zatsman, I.M., O.Yu. Inkova, M.G. Kruzhkov, and N. A. Popkova. 2016. Predstavlenie krossyazykovykh znaniy o konnektorakh v nadkorpusnykh bazakh dannykh [Representation of cross-lingual knowledge about connectives in supracorpora databases]. Informatika i ee Primeneniya - Inform. Appl. 10(1):106-118.
- Prasad, R., N. Dinesh, A. Lee, E. Miltsakaki, L. Robal- do, A. Joshi, and B. Webber. 2008. The Penn Discourse TreeBank 2.0. 6th Conference (International) on Language Resources and Evaluation (LREC) Proceedings. Paris: Eu-ropean Language Resources Association (ELRA). 2961-2968.
- Goutte, C., N. Cancedda, M. Dymetan, and G. Foster, eds. 2009. Learning machine translation. London: MIT Press. 316 p.
- Lo, C., and D. Wu. 2011. MEANT: An inexpensive, high- accuracy, semi-automatic metric for evaluating translation utility via semantic frames. Human Language Technologies: 49th Annual Meeting of the Association for Computational Linguistics Proceedings. Stroudsburg: Association for Computational Linguistics. 1:220-229.
- Zatsman, I. 2012. Tracing emerging meanings by computer: Semiotic framework. 13th European Conference on Knowledge Management Proceedings. Reading, U.K.: Aca-demic Publishing International Ltd. 2:1298-1307.
- Zatsman, I., N. Buntman, M. Kruzhkov, V. Nuriev, and Anna A. Zalizniak. 2014. Conceptual framework for development of computer technology supporting cross- linguistic knowledge discovery. 15th European Conference on Knowledge Management Proceedings. Reading, U.K.: Academic Publishing International Ltd. 3:1063-1071.
- Zatsman, I., and N. Buntman. 2015. Outlining goals for discovering new knowledge and computerised tracing of emerging meanings discovery. 16th European Conference on Knowledge Management Proceedings. Reading, U.K.: Academic Publishing International Ltd. 851-860.
- Zatsman, I. 2015. Protsessy tselenapravlennoy genera- tsii i razvitiya krossyazykovykh ekspertnykh znaniy: Se- mioticheskie osnovaniya modelirovaniya [Goal-oriented processes of cross-lingual expert knowledge creation: Semiotic foundations for modeling]. Informatika i ee Primeneniya - Inform. Appl. 9(3):106-123.
- Zatsman, I., N. Buntman, A. Coldefy-Faucard, and V. Nuriev. 2016. WEB knowledge base for asynchronous brainstorming. 17th European Conference on Knowledge Management Proceedings. Reading, U.K.: Academic Pub-lishing International Ltd. 976-983.
[+] About this article
Title
SUPRACORPORA DATABASE ON CONNECTIVES: TERM SYSTEM DEVELOPMENT
Journal
Informatics and Applications
2017, Volume 11, Issue 1, pp 100-108
Cover Date
2017-02-30
DOI
10.14357/19922264170109
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
supracorpora database; term system; connectives; linguistic annotation; parallel texts; corpus linguistics; chronotypical faceted classification
Authors
Anna A. Zaliznyak , ,
I. M. Zatsman , and O. Yu. Inkova
Author Affiliations
Institute of Linguistics, Russian Academy of Sciences, 1-1 Bolshoy Kislovskiy Per., Moscow 125009, Russian Federation
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
University of Geneva, 22 Bd des Philosophes, CH-1205 Geneva 4, Switzerland
|