Informatics and Applications
2015, Volume 9, Issue 2, pp 93-110
ASSOCIATIVE PORTRAITS OF SUBJECT AREAS AS A TOOL FOR AUTOMATED CONSTRUCTION OF BIG DATA SYSTEMS FOR KNOWLEDGE EXTRACTION: THEORY, METHODS, VISUALIZATION, AND APPLICATION
- I. V. Galina
- E. B. Kozerenko
- Yu. I. Morozova
- N. V. Somin
- M. M. Charnine
Abstract
The paper presents the technique of developing systems for extraction of knowledge which employs the approach of automated association portrait of a subject area (APSA) formation and building a semantic context space (SCS). The ideology of the APSA is based on the distributional hypothesis claiming that semantically equal (or related) lexemes have a similar context and, vice versa, in a similar context, the lexemes are semantically close. The model uses an extended hypothesis that consists in the investigation of similarities and differences in contexts not only of individual words, but of arbitrary multilexeme fragments of meaningful word-combinations.
The examples of implemented projects for different subject domains are given.
[+] References (33)
- Charnine, M. M., N. V. Somin, I. P. Kuznetsov, Yu. I. Morozova, I. V. Galina, and E. B. Kozerenko. 2013. Statisticheskie mekhanizmy formirovaniya assotsiativnykh portretov predmetnykh oblastey na osnove estestvenno- yazykovykh tekstov bols'shikh ob"emov dlya system izvlecheniya znaniy [Statistical mechanisms ofthe subject domains associative portraits formation on the basis of big natural language texts for the systems of knowledge extraction]. Informatika i ee Primeneniya - Inform. Appl. 7(2):92-99.
- Charnine, M., A. Petrov, and I. Kuznezov. 2013. Association-based identification of Internet user interests. ICAI'13: Conference (International) on Artificial Intelligence Proceedings. Las Vegas, USA: CSREA Press. 77-81.
- Kozerenko, E. B. 2014. Integral'noe modelirovanie yazykovykh struktur v lingvisticheskikh protsessorakh sys-tem obrabotki znaniy i mashinnogo perevoda [Integrated modeling of language structures for linguistic processors of knowledge management and machine translation systems]. Informatika i ee Primeneniya - Inform. Appl. 8(1):89-98.
- Rapp, R. 2003. Word sense discovery based on sense descriptor dissimilarity. Conference (International) 9th MT Summit Proceedings. New Orleans, LA. 315-322.
- Charnine, M., and V. Protasov. 2013. Optimal automated method for collaborative development of universiry curricula. ICAF13: Conference (International) on Artificial Intelligence Proceedings. Las Vegas, USA: CSREA Press. 96-100.
- Morozova, Yu. I. 2013. Postroenie semanticheskikh vek tornykh prostransv razlichnykh predmetnykh oblastey [ Semantic vector spaces for different knowledge domains]. Informatika i ee Primeneniya - Inform. Appl. 7(1):90-93.
- Charnine, M., N. Somin, and V. Nikolaev. 2014. Conceptual text generation based on key phrases. ICAI'14: Conference (International) on Artificial Intelligence Pro-ceedings. Las Vegas, USA: CSREA Press. 639-643.
- Bacon, E., G. Hagel, M. Charnine, R. Foggie, B. Kirk,
I. Schagaev, and G. Kravtsov. 2013. WEDUCA: Web- enhanced design of university curricula. FECS'13 Confer-ence (International) on Frontiers in Education: Computer Science and Computer Engineering Proceedings. Las Vegas, USA: CSREA Press. 288-294.
- Salton, G. M., ed. 1971. The SMART retrieval system: Experiments in automatic document processing. Prentice-Hall. 556 p.
- Dunning, T. 1993. Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1):61-74.
- Borisova, E. G. 1995. Kollokatsii. Chto eto takoe i kak ikh izuchat' [Collocations. What are they, and how are they to be studied]. 2nd ed. Moscow: Filologiya. 49 p.
- Church, K., and P Hanks. 1996. Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1):22-29.
- Lund, K., and C. Burgess. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Meth. Ins. C. 28(2):203-208.
- Sparck Jones, K., and P. Willett, eds. 1997. Readings in information retrieval. San Franscisco, CA: Morgan Kauf- mann. 594 p.
- Sparck Jones, K. 2004. A statistical interpretation of term specificity and its application in retrieval. J. Documentation, MCB University Press. 60(5):493-502.
- Charnine, M.M., I. P Kuznetsov, and E. B. Kozerenko.
2005. Semantic navigator for Internet search. MLMTA'05 Conference (International) on Machine Learning Proceedings. Las Vegas, USA: CSREA Press. 60-68.
- Sahlgren, M. 2006. Towards pertinent evaluation method-ologies for word-space models. LREC 2006:5th Conference (International) on Language Resources and Evaluation Pro-ceedings. Genoa, Italy. 821-824.
- Sparck Jones, K. 2007. Statistics and retrieval: Past and future. Conference (International) on Computing: Theory and Applications Proceedings. Platinum Jubilee Conference ofthe Indian Statistical Institute. Kolkata, India: IEEE.
- Landauer, Th. K., D. S. McNamara, S. Dennis, and W Kintsch. 2007. Handbook of latent semantic analysis. Mahwah, NJ: Lawrence Erlbaum. 544 p.
- Iordanskaya, L. N., and I. A. Melchuk. 2007. Smysl i so- chetaemost' v slovare [Meaning and combinability in the dictionary]. Moscow: Slavonic Cultures Languages. 672 p.
- Charnine, M., and V. Charnine. 2008. Keywen category structure. Wordclay, USA. 60 p.
- Lenci, A. 2008. Distributional semantics in linguistic and cognitive research. Rivista di Linguistica 2:1-30.
- Manning, C., P Raghavan, and H. Schiitze. 2008. In-troduction to information retrieval. Cambridge: Cambridge University Press. 581 p.
- Sahlgren, M. 2008. The distributional hypothesis. From context to meaning. Distributional Models of the Lexicon in Linguistics and Cognitive Science: Special issue of the Italian J. Linguistics: Rivista di Linguistica 20(1):33-53.
- Baroni, M., and A. Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Comput. Linguist. 36(4):673-721.
- Turney, P D., and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. JAIR 37:141- 188.
- Zakharov, V. P, and M. V. Khokhlova. 2010. Analiz effektivnosti statisticheskikh metodov vyyavleniya kollokatsiy v tekstakh na Russkom yazyke [The analysis of statistical methods for the discovery of collocations in the Russian language texts]. Computational Linguistics and Intelligent Technologies. Dialog'10: Conference (International) Pro-ceedings. Moscow: Russian State University for Humani-ties. 9(16):147-143.
- Kuznetsov, I. P, M. M. Charnine, E. B. Kozerenko, N. V. Somin, V. G. Nikolayev, and A. G. Matskevich. 2012. Intelligent tools for the semantic Internet navigator design. RCDL'2012 Conference (International) on Digital Libraries Proceedings. Pereslavl-Zalesski, Russia. 274-283.
- Kuznetsov, I. P, E. B. Kozerenko, and M.M. Charnin.
2012. Technological peculiarity of knowledge extraction for logical analytical systems. ICAI12: Conference (Inter-national) on Artificial Intelligence Proceedings. Las Vegas, USA: CSREA Press. 762-768.
- Schumann, A. 2012. Towards the automated enrichment of multilingual terminology databases with knowledge- rich contexts. Computational Linguistics and Intelligent Technologies. Dialog'12: Conference (International) Pro-ceedings. Moscow: Russian State University for Humani-ties. 11(18):559-567.
- Sharnin, M.M., and I. P. Kuznetsov. 2012. Osobennosti semanticheskogo poiska informatsionnykh ob"ektov na osnove tekhnologii baz znaniy [Semantic search of natural language information on the basis of knowledge base technology]. Informatika i ee Primeneniya - Inform. Appl. 6(2):47-56.
- Borisov, T. N., A. E. Bronetski, S.V. Klimenko, V. V. Rykov, and M. M. Charnine. 2013. Avtonomnye neobitaemye podvodnye apparaty: Avtomaticheskoe formirovanie assotsiativnogo portreta predmetnoy oblasti [Autonomous uninhabited submarine apparata: Automatic creation of the subject area associative portrait]. SC-IAS4i-VRTerro2013 Conference (International) on Sit-uational Centers and Information Analytical Systems of 4i Class for the Tasks of Monitoring and Security Proceedings. Protvino: IFTI. 38-43.
- Zolotarev, O., M. Charnine, and A. Matskevich. 2014. Conceptual business process structuring by extracting knowledge from natural language texts. ICAI'14: Confer-ence (International) on Artificial Intelligence Proceedings. Las Vegas, USA: CSREA Press. 82-87.
[+] About this article
Title
ASSOCIATIVE PORTRAITS OF SUBJECT AREAS AS A TOOL FOR AUTOMATED CONSTRUCTION OF BIG DATA SYSTEMS FOR KNOWLEDGE EXTRACTION: THEORY, METHODS, VISUALIZATION, AND APPLICATION
Journal
Informatics and Applications
2015, Volume 9, Issue 2, pp 93-110
Cover Date
2015-02-30
DOI
10.14357/19922264150211
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
semantic modeling; associations; mathematical statistics; distributive semantics; big data; automated extraction of knowledge; digital natural language text corpora; semantic search; intelligent Internet technology
Authors
I. V. Galina , E. B. Kozerenko ,
Yu. I. Morozova , N. V. Somin , and M. M. Charnine
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|