Informatics and Applications
2020, Volume 14, Issue 1, pp 113-120
ANALYTICAL TEXTOLOGY IN INTELLIGENT PROCESSING SYSTEMS FOR UNSTRUCTURED DATA
- E. B. Kozerenko
- M. Y. Mikheev
- N. V. Somin
- L. I. Ehrlich
- K. I. Kuznetsov
Abstract
The paper presents a new field of research at the intersection of linguistics, computer science, and philology involving logical and statistical methods of analyzing unstructured data in the form of natural language texts in order to solve a number of the tasks of extracting explicit and implicit knowledge from texts using a semantics-oriented linguistic processor, forming lexical statistical representations of texts, building analytical conclusions, discovery of the author's idiostyle and textual similarity of literary works based on the analysis of service words and other microtext elements; identifying the sentiment of texts, building a full profile of the author's text based on the superposition of methods. The example of the textological analysis of the "Blue Book" of the "Petersburg Diary" by Zinaida Hippius is considered.
[+] References (20)
- Kuznetsov, I. P., E. B. Kozerenko, and A. G. Matskevich.
2011. Intelligent extraction of knowledge structures from natural language texts. IEEE/WIC/ACM Joint Conferences (International) on Web Intelligence and Intelligent Agent Technology Proceedings - Workshops WI-IAT Proceedings. Lyon, France: IEEE Computer Society. 269-272.
- Kozerenko, E. B., K. I. Kuznetsov, Yu. I. Morozova, and
D. A. Romanov. 2017. Semantic proximity establishment in the tasks of knowledge extraction and named entities recognition. Conference (International) on Artificial Intelligence, WORLDCOMP'17 Proceedings. Las Vegas, NV CSREA. 339-344.
- Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1):1-22.
- Rapp, R. 2003. Word sense discovery based on sense descriptor dissimilarity. 9th Machine Translation Summit Proceedings. New Orleans, LA. 315-322.
- Lenci, A. 2008. Distributional semantics in linguistic and cognitive research. Riv. Linguist. 1:1-30.
- Turney, P. 2008. A uniform approach to analogies, synonyms, antonyms and associations. 22nd Conference (International) on Computational Linguistic Proceedings. Manchester. 905-912.
- Baroni, M., and A. Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Comput. Linguist. 36(4):673-721.
- Schumann, A. 2012. Towards the automated enrichment of multilingual terminology databases with knowledge- rich contexts. Computational Linguistics and Intellectual Technologies: Conference (International) "Dialogue 2012" Proceedings. Moscow. 1(11):559-567.
- Kozerenko, E. B. 2015 "Nashikh dedov mechta nevoz- mozhnaya" - Uchreditel'noe sobranie v Chernykh Tetradyakh Zinaidy Gippius ["The impossible dream of our grandfathers" - Constituent assembly in the Black Notebooks of Z. N. Hippius]. Eds. E. B. Ko-zerenko, A. G. Kravetsky, and M.Y. Mikheev. Con-ference (International) "Marginalia-2015: Borders of Culture and Text" Proceedings. Polotsk. Available at: http://uni-persona.srcc.msu.ru/site/conf/marginalii- 2015/thesis.htm (accessed March 10, 2020).
- Kozerenko, E. B. 2017. Fevral' 17-go v "Siney knige" Z. N. Gippius: opyt tekstologicheskogo analiza [February of 17th in the "Blue book" of Z. N. Hippius: The case of the textological analysis]. Eds. A. G. Kra
vetsky and M.Y. Mikheev. Conference (International) "Marginalia-2017: Borders of Culture and Text" Proceedings. Torzhok. Available at: http://uni-persona. srcc.msu.ru/site/conf/marginalii-2017/thesis.htm (ac-cessed March 10, 2020).
- Mikheev, M. Yu., andL. I. Ehrlich. 2018. Idiostilevoypro- fil' i opredelenie avtorstva teksta po chastotam sluzheb- nykh slov [Individual style profile and text authorship detection based on the service words frequencies]. Nauchno-technicheskaya informatsia. Ser. 2. Informatsion- nye protsessy i sistemy [Scientific Technical Information. Ser. 2. Information Processes and Systems] 2:25-34.
- Charnine, M.M., I. P. Kuznetsov, and E. B. Kozerenko. 2005. Semantic navigator for Internet search. Conference (International) on Machine Learning Proceeding. Las Vegas, NV: CSREA Press. 60-68.
- Kuznetsov, I. P., and N. V. Somin. 2012. Vyyavlenie im- plitsitnoy informatsii iz tekstov na estestvennom yazyke: problemy i metody [Revealing implicit information from texts in natural language: Problems and methods]. Informatika i ee Primeneniya - Inform. Appl. 6(1):48-57.
- Kuznetsov, I. P., E. B. Kozerenko, and M. M. Charnine.
2012. Technological peculiarity of knowledge extraction for logical-analytical systems. WORLDCOMP'12: ICAI'12 Proceedings. Las Vegas, NV: CSREA Press. II:762-768.
- Charnine, M.M., and I. P. Kuznetsov. 2012. Osobenno- sti semanticheskogo poiska informatsionnykh ob"ektov na osnove tekhnologii baz znaniy [The peculiarities of the semantic search of information objects founded on the knowledge bases technology]. Informatika i ee Prime- neniya - Inform. Appl. 6(2):47-56.
- Lund, K., and C. Burgess. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Meth. Ins. C. 28(2):203-208.
- McCarthy, D., R. Koeling, J. Weeds, and J. Carroll. 2004. Finding predominant senses in untagged text. 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona, Spain: ACL. 280-287.
- Baroni, M., and R. Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL. 1183-1193.
- Hippius, Z.N. 2017. Dnevniki [Diaries]. Moscow: Za-kharov. 528 p.
- The Internet resource of Z. N. Hippius works, "Sinyaya kniga." Available at: https://gippius.com/doc/memory/ sinyaya-kniga.html (accessed January 27, 2020).
[+] About this article
Title
ANALYTICAL TEXTOLOGY IN INTELLIGENT PROCESSING SYSTEMS FOR UNSTRUCTURED DATA
Journal
Informatics and Applications
2020, Volume 14, Issue 1, pp 113-120
Cover Date
2020-03-30
DOI
10.14357/19922264200115
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
natural language processing; statistical methods; cognitive technology; lexical semantic analysis; knowledge extraction from texts; analytical systems
Authors
E. B. Kozerenko , M. Y. Mikheev , N. V. Somin , L. I. Ehrlich , and K. I. Kuznetsov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
Research Computing Center Lomonosov Moscow State University, 1, bld. 4 Leninskie Gory, Moscow, GSP-1, 119991, Russian Federation
|