Informatics and Applications
2018, Volume 12, Issue 3, pp 91-98
SEMANTIC PROCESSING OF UNSTRUCTURED TEXTUAL DATA BASED ON THE LINGUISTIC PROCESSOR PullEnti
- E. B. Kozerenko
- K. I. Kuznetsov
- D. A. Romanov
Abstract
The paper presents the method for creation of knowledge extraction systems based on the approach employing the software tool system PullEnti comprising the algorithms for morphological and semantic-syntactical analysis which makes it possible to extract entities of certain types from natural language texts (persons, organizations, locations, and other target semantic objects). The PullEnti system uses dynamically connected components (plugins) which makes it possible to activate various functions without recompiling. This is how the semantic analysis unit is incorporated. During the analysis, the semantic units (tokens) are established, which are typed phrases: text, numerical data, etc. Examples of implemented projects for different subject areas are given.
[+] References (12)
- Shaumyan, S. 2003. Categorial grammar and semiotic
universal grammar. Conference (International) on Artificial
Intelligence Proceedings. Las Vegas, NV: CSREA Press.
623-629.
- Kuznetsov, I. P., E. B. Kozerenko, and A. G. Matskevich. 2011. Intelligent extraction of knowledge structures
from natural language texts. IEEE/WIC/ACMConferences
(International) on Web Intelligence and Intelligent Agent
Technology Proceeding. Washington, DC: IEEE Computer Society. 3:269-272. doi: 10.1109/WI-IAT.2011.235.
- Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977.
Maximum likelihood from incomplete data via the EM
algorithm. J. Roy. Stat. Soc. B 39(1):1-22.
- Lund, K., and C. Burgess. 1996. Producing high-
dimensional semantic spaces from lexical co-occurrence.
Behav. Res. Meth. Ins. C. 28(2):203-208.
- Curran, J. R. 2004. From distributional to semantic
similarity. Edinburgh: University of Edinburgh. PhD
Thesis. 177 p. Available at: https://www.inf.ed.ac.uk/
publications/thesis/online/IP030023.pdf (accessed Ju-
ly 19, 2018).
- McCarthy, D., R. Koeling, J. Weeds, and J. Carroll. 2004.
Finding predominant senses in untagged text. 42ndAnnual
Meeting of Association for Computational Linguistics Pro-
ceedings. Stroudsburg, PA: Association for Computational
Linguistics. 280-287. doi: 10.3115/1218955.1218991.
Received July 13, 2018
- Clark, S., and S. Pulman. 2007. Combining symbolic and distributional models of meaning. AAAI Spring Symposium on Quantum Interaction Proceedings. Palo Alto, CA: AAAI Press. 4 p. Available at: http://www.cl.cam.ac.uk/ ~sc609/pubs/aaai07.pdf (accessed July 19, 2018).
- Kozerenko, E. B. 2012. Parallel texts alignment strategies. Conference (International) on Artificial Intelligence Proceedings. Las Vegas, NV: CSREA Press. 2:945-951.
- Danielson, D. A. 2003. Vectors and tensors in engineering and physics. 2nd ed. Boulder, CO: Westview Press. 287 p.
- Montague, R. 1970. Universal grammar. Theoria 36:373- 398. (Reprinted in: 1974. Formal philosophy: Selected papers of Richard Montague. Ed. R. H. Thomason. New Haven, CT: Yale University Press. 7-27.)
- Pang, B., K. Knight, and D. Marcu. 2003. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Proceedings. Stroudsburg, PA: Association for Computational Linguis-tics. 1:102-109. doi: 10.3115/1073445.1073469.
- FACRUEVAL. 2016. Evaluation of named entity recognition and fact extraction systems for Russian
[+] About this article
Title
SEMANTIC PROCESSING OF UNSTRUCTURED TEXTUAL DATA BASED ON THE LINGUISTIC PROCESSOR PullEnti
Journal
Informatics and Applications
2018, Volume 12, Issue 3, pp 91-98
Cover Date
2018-08-30
DOI
10.14357/19922264180313
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
semantic modeling; named entities recognition, data intensive domains; automated systems of knowledge extraction; semantic search; intelligent Internet technologies
Authors
E. B. Kozerenko , K. I. Kuznetsov , and D. A. Romanov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
National Research University "Higher School of Economics," 20 Myasnitskaya Str., Moscow 101000, Russian Federation
|