Informatics and Applications
2023, Volume 17, Issue 3, pp 93-99
FORMALIZED DESCRIPTION OF STATISTICAL INFORMATION PROCESSING IN DATABASES
- V. V. Vakulenko
- I. M. Zatsman
Abstract
The paper presents an overview of the stages of statistical processing of text data, from specific informational objects in databases to the values of the numerical characteristics of these objects. For example, if the database contains the descriptions of full-text research articles, then they represent specific informational objects. With the appropriate population of such a database, the multistage procedure of their processing makes it possible to determine the values of the numerical characteristics of the publication activity of a researcher, a scientific division, and a scientific organization as a whole. Such procedures begin with the processing of specific informational objects and end with computing of the values of the numerical characteristics of these objects. At intermediate stages, tables and other both verbal and numerical objects may form. If the stages of the statistical processing are designed to be reversible and the database implements the function of verifying the values of the numerical characteristics, then the procedure of their verification begins with the values of the characteristics and ends with access to specific informational objects that were used to compute these values. The paper proposes a formalized description of the stages of statistical processing of text data in databases. Informational-mathematic transformation (IM-transformation) is the proposed name for such transformation of text data into numerical values. It combines the processing of specific informational objects, the formation of verbal and numerical objects, and the mathematical computation of the values of numerical characteristics. Such transformation of text data may include mathematic processes at certain stages; however, it does not completely reverse back to them. The goal of the paper is to propose the principles of formalized description of IM-transformation of texts in databases. To illustrate this, the paper provides the example of formalizing the process of determining the frequency of translation variants of connectives expressing intertextual relations between text fragments in the supracorpora database of connectives developed in the FRC CSC RAS.
[+] References (18)
- Goncharov, A. A., and O. Yu. Inkova. 2019. Metodika poiska implitsitnykh logiko-semanticheskikh otnosheniy v tekste [Methods for identification of implicit logical- semantic relations in texts]. Informatika i ee Primeneniya - Inform. Appl. 13(3):97-104. doi: 10.14357/ 19922264190314.
- Zalizniak, Anna A., and M. G. Kruzhkov. 2016. Baza dannykh bezlichnykh glagol'nykh konstruktsiy russkogo yazyka [Database of Russian impersonal verbal constructions]. Informatika i ee Primeneniya - Inform. Appl. 10(4):132- 141. doi: 10.14357/19922264160414.
- Goncharov, A. A., O. Yu. Inkova, and M.G. Kruzhkov.
2019. Metodologiya annotirovaniya v nadkorpusnykh bazakh dannykh [Annotation methodology of supracorpora databases]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(2):148-160. doi: 10.14357/ 08696527190213.
- Nuriev, V. A., and M. G. Kruzhkov. 2023. Korpusnye dannye pri kontrastivnom izuchenii punktuatsii [The parallel corpora perspective on studying contrastive punctuation]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 33(1):14-23. doi: 10.14357/08696527230102.
- Ide, N., and J. Pustejovsky, eds. 2017. Handbook of linguistic annotation. Dordrecht, The Netherlands: Springer Science + Business Media. 1468 p.
- Doval, I., and M.T. Sanchez Nieto, eds. 2019. Parallel corpora for contrastive and translation studies: New resources and applications. Amsterdam, Philadelphia: John Benjamins. 310 p.
- Granger, S., and M.-A. Lefer, eds. 2020. Translating and comparing languages: Corpus-based insights. Louvain-la- Neuve, Belgique: Presses universitaires de Louvain. 298 p.
- Lavid-Lopez, J., C. Mai'z-Arevalo, and J. R. Zamorano- Mansilla, eds. 2021. Corpora in translation and contrastive research in the digital age: Recent advances and explorations. Amsterdam, Philadelphia: John Benjamins. 351 p.
- Granger, S., and M.-A. Lefer, eds. 2022. Extending the scope of corpus-based translation studies. London: Bloomsbury Academic. 288 p.
- Savchenko, E. 2013. A contrastive study of the English and Norwegian cognates from and fra. Oslo, Norway: University of Oslo. Master's Thesis. 120 p. Available at: https://www.duo.uio.no/handle/10852/37026 (accessed July 10, 2023).
- The English-Norwegian parallel corpus (ENPC). Available at: https://www.hf.uio.no/ilos/english/services/ knowledge-resources/omc/enpc (accessed July 10, 2023).
- Hasan, A. A., and I. H. Abdullah. 2009. A cross mapping of temporal at-ba "Forward and backward translation." English Language Teaching 2(1):80-84. doi: 10.5539/elt. v2n1p80.
- Zatsman, I. M. 2006. Polidomennye modeli v sistemakh otsenki innovatsionnogo potentsiala i rezul'tativnosti nauchnykh issledovaniy [Polydomain models for evaluation systems of innovative potential and performance of researches]. Computational Linguistics and Intellectual Technologies: Conference (International) "Dialog 2006" Proceedings. Moscow: RGGU. 178-183.
- Zatsman, I. M. 2006. Polidomennye modeli elektronnykh bibliotek sistem monitoringa sfery nauki [Polydomain models for digital libraries of monitoring systems in scientific sphere]. Elektronnyye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii: Tr. 8-y Vseross. nauchn. konf. [Digital Libraries: Advanced Methods and Technologies, Digital Collectons. 8th All-Russian Research Conference Proceedings]. Yaroslavl': P. G. Demidov Yaroslavl' State University. 75-81.
- Zatsman, I. M., G. F Verevkin, I. V. Drynova, O. A. Kurchavova, N. V. Larin, and T. P. Norekyan. 2006. Modelirovanie sistem informatsionnogo monitoringa kak problema informatiki [Model for systems of information monitoring as a problem of informatics]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 16(3):257-278.
- Zatsman, I.M., G. F Verevkin, and S.K. Shubnikov. 2008. Modelirovanie sistem monitoringa [Modeling of monitoring systems]. Moscow: IPI RAN. 115 p.
- Zatsman I.M., and G. F Verevkin. 2006. Informatsionnyy monitoring sfery nauki v zadachakh programmno- tselevogo upravleniya [Information monitoring for scientific sphere in programme budgeting problems]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 16(1):185-210.
- Pospelov, D.A. 1981. Logiko-lingvisticheskie modeli v sistemakh upravleniya [Logical-linguistic models in control systems]. Moscow: Energoizdat. 231 p.
[+] About this article
Title
FORMALIZED DESCRIPTION OF STATISTICAL INFORMATION PROCESSING IN DATABASES
Journal
Informatics and Applications
2023, Volume 17, Issue 3, pp 93-99
Cover Date
2023-10-10
DOI
10.14357/19922264230313
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
informational-mathematic transformation; text information; statistical processing of text information; supracorpora database
Authors
V. V. Vakulenko and I. M. Zatsman
Author Affiliations
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|