Systems and Means of Informatics
2015, Volume 25, Issue 3, pp 235-250
THE SYSTEM OF FACTS EXTRACTION FROM HISTORICAL TEXTS
- I. M. Adamovich
- O. I. Volkov
Abstract
Text surfing is described as a separate subclass of such important part of biographic investigation as Internet search. Text surfing is the search of useful information, the character of which cannot be foreseen, and therefore, the appropriate web search query cannot be formulated. The technology of automatic fact extraction is proposed for text surfing. The implementation of such technology is described. Special attention is paid to the problem of anaphora resolution, when the interpretation of an expression depends on another expression in the context. A new hierarchical view of a biographical fact is proposed and analyzed.
The experimental verification of applicability of the proposed technology for the memoir and historical literature is described. The article reports the results of these experiments, which confirm applicability and perspectivity of the proposed approach. This technology is meant for a wide range of users, which are not professional historians and biographers. This is important today because public interest in family history is increasing.
[+] References (16)
- Ikonnikova, S.N. 2012. Biografika kak chast' istoricheskoy kul'turologii [Biografical studies as part of the historical cultural studies]. Vestnik SPbGUKI [Bulletin of Saint-Petersburg State University of Culture and Art] 2(11):6- 10.
- Adamovich, I. M., and O.I. Volkov. 2014. Sredstva podderzhki internet-poiska pri provedenii biograficheskikh issledovaniy [The technology of Internet-searching as the part of the biographic investigation]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 2 (24): 17 8-192.
- Adamovich, I. M. 2012. Metody i sredstva spravochno-poiskovoy podderzhki nauch- nykh i sotsial'no-kul'turnykh proektov na osnove integratsii dannykh raznorodnykh biograficheskikh istochnikov [Methods and tools of information support of scientific and sociocultural projects based on the integration of heterogeneous data of different biographical sources]. Research Report. IPI FRC CSC RAS. Moscow. 92 p.
- Nezhdanov, I. 2011. Monitoring novostey v konkurentnoy razvedke [The news monitoring in Competitive Intelligence]. Available at: http://nejdanov.livejournal.com/ 17026.html (accessed August 12, 2015).
- Smirnov, I.V., and A. O. Shelmanov. 2013. Semantiko-sintaksicheskiy analiz es- testvennykhyazykov. Ch. I. Obzor metodov sintaksicheskogo i semanticheskogo analiza tekstov [Semantic-syntactic analysis of natural languages. Pt. I. Areview ofmethodsfor semantic and syntactic analysis of text]. Iskusstvennyy Intellekt i Prinyatie Resheniy [Artifical Intelligence and Decision Making] 1:41-54.
- Stepanov, P. A. 2013. Sistemy analiza tekstov estestvennogo yazyka [Systems of natural language texts analysis]. Al'manakh Sovremennoy Nauki i Obrazovaniya [Almanac of Modern Science and Education] 6(73): 159-161.
- Kuznetsov, I. P., E.B. Kozerenko, K.I. Kuznetsov, and N. O. Timonina. 2009. Intelligent system for entities extraction (ISEE) from natural language texts. Workshop (International) on Conceptual Structures for Extracting Natural Language Semantics (SENSE'09) at the 17th Conference (International) on Conceptual Structures (ICCS'09) Proceedings. Eds. U. Priss and G. Angelova. Moscow, Russia: University Higher School of Economics. 17-25.
- Kuznetsov, I. P., E. B. Kozerenko, and A. Matskevich. 2011. Intelligent extraction of knowledge structures from natural language texts. Web Intelligence/IAT Workshops. 269-272.
- Kozerenko, E. B., and P. V. Ermakov. 2011. The strategies of syntactic analysis based on head-driven grammars and the methods of their implementation in information systems. Informatika i ee Primeneniya - Inform. Appl. 5(4): 107-113.
- Kuznetsov, I. P., and N. V. Somin. 2012. Vyyavlenieimplitsitnoyinformatsiiiztekstov na estestvennom yazyke: Problemy i metody [Extraction of implicit information from the texts in natural language: Problems and methods]. Informatika i ee Primeneniya - Inform. Appl. 6(1):49-58.
- Sharnin, M.M., and I. P. Kuznetsov. 2012. Osobennosti semanticheskogo poiska informatsionnykh ob"ektov na osnove tekhnologii baz znaniy [Semantic search of natural language information on the basis of knowledge base technology]. Informatika i ee Primeneniya - Inform. Appl. 6(2): 113-121.
- Ogorodnik, R. V., and L. V. Serebrenaya. 2014. Obrabotka teksta s pomoshch'yu Tomita-parsera [Text processing using Tomita-parser]. Scientific Symposium (International) "Information Technology and Systems 2014" Proceedings. Minsk. 220-231.
- Mal'kovskiy, M.G., A. S. Starostin, and I. A. Shilov. 2013. Metod razresheniya mestoimennoy anafory v protsesse sintaksicheskogo analiza [Method of anaphoric pronoun resolution at the process of syntax analysis]. Nauchnye Trudy SWORLD [SWORLD Proceedings] 4(11):41-49.
- Boyarskiy, K. K., E. A. Kanevskiy, and A. V. Stepukova. 2013. Vyyavlenie anaforiche- skikh otnosheniy pri avtomaticheskom analize teksta [Anaphoric relations identification by automatic text analysis]. Nauch.-Tekhnich. Vestnik Informatsionnykh Tekhnologiy, Mekhaniki i Optiki [Scientific and Technical J. of Information Technologies, Mechanics and Optics] 5(87): 108-112.
- Uryupina, O. 2008. Avtomaticheskoe razbienie teksta na predlozheniya dlya russko- go yazyka [Detecting sentence boundaries in Russian]. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii: Po mat-lam Ezhegodnoy Mezhdunar. Konf. "Dialog" ["Computational Linguistics and Intellectual Technologies" by Annual Conference (International) "Dialog" Proceedings]. Moscow. 7(14):539-544.
- Markova, N. A. 2012. Logika biograficheskih faktov [A logic of biographical facts]. Informatika i ee Primeneniya - Inform. Appl. 6(2):49-58.
[+] About this article
Title
THE SYSTEM OF FACTS EXTRACTION FROM HISTORICAL TEXTS
Journal
Systems and Means of Informatics
Volume 25, Issue 3, pp 235-250
Cover Date
2015-09-30
DOI
10.14357/08696527150315
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
biographic investigation; facts extraction from texts; anaphora resolution; hierarchy of facts
Authors
I. M. Adamovich and O. I. Volkov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|