Systems and Means of Informatics
2018, Volume 28, Issue 3, pp 217-226
LINEAR ORDERING OF THE RULES SET IN THE T-PARSER SYSTEM OF FACTS EXTRACTION
- I. M. Adamovich
- O. I. Volkov
Abstract
The article focuses on further development of T-parser, a system of automatic extraction of facts from historical texts, which is a component of the technology of historical and biographical research automation. The article analyses the defects of the current implementation, describes and substantiates the methods of their correction by excluding cycles from the grammar and its linear ordering. The description of the updated parsing algorithm and experimental verification of its effectiveness in comparison with the previous version carried out with real historical texts are adduced. The results of experiments which confirm the high efficiency of the updated algorithm and its applicability to the technology of historical and biographical research automation are described.
The technology is intended for a broad range of nonprofessional users, which is topical as the public interest to family history is increasing. The ways of further modification of the algorithm with the purpose of increasing facts extraction efficiency are outlined.
[+] References (13)
- Adamovich, I. M., and O.I. Volkov. 2016. Tekhnologiya raspredelennogo avtoma- tizirovannogo analiza istoricheskikh tekstov [The distributed automated technology of historical texts analysis]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 3(26):148-161. doi: 10.14357/08696527160311.
- Adamovich, I. M., and O.I. Volkov. 2015. Sistema izvlecheniya biograficheskikh faktov iz tekstov istoricheskoy napravlennosti [The system of facts extraction from historical texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 2(25):235-250. doi: 10.14357/08696527150315.
- Budzko V. I., Yu. P. Kalinin, E. B. Kozerenko, A. A. Khoroshilov, and A. A. Khoroshi- lov. 2017. Mashinnaya grammatika russkogo yazyka [Machine grammar of the Russian language]. Sistemy vysokoy dostupnosti [Highly Available Systems] 3(13): 19-33.
- Adamovich, I. M., and O. I. Volkov. 2014. Sredstva podderzhki internet-poiska pri provedenii biograficheskikh issledovaniy [The technology of Internet search as a part of biographic investigation]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 2(24):178-192. doi: 10.14357/08696527140212.
- Pogorelov, D. A., A.M. Tarzanov, and L. L. Volkova. 2017. Ot LR k GLR: obzor sintaksicheskikh analizatorov [From LR to GLR: Overview of parsers]. Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh [New Information Technologies in Automated Systems] 20:245-250.
- Artemov, M. A., A.N. Vladimirov, and K.E. Seleznyov. 2013. Obzor sistem analiza estestvennogo teksta na russkom yazyke [Review on natural language analysis systems for Russian language]. Vestnik Voronezhskogo gosudarstvennogo universiteta. Seriya: Sistemnyy analiz i informatsionnye tekhnologii [Proceedings of Voronezh State University. Ser. Systems Analysis and Information Technologies] 2:189-194.
- Adamovich, I. M., and O. I. Volkov. 2016. Ierarkhicheskaya forma predstavleniya biograficheskogo fakta [Hierarchial format of biohraphical fact]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 2(26): 108-122. doi: 10.14357/ 08696527160207.
- Ageev, M. S., I. E. Kuralenok, and I. S. Nekrest'yanov. 2010. Ofitsial'nye metriki ROMIP 2010 [R0MIP'2010 official metrics]. Rossiyskiy seminar po otsenke metodov informatsionnogo poiska. Trudy ROMIP'2010 [ROMIP: Russian Information Retrieval Evaluation Seminar Proceedings]. Kazan: Kazan University Publs. 172-187.
- Rubailo, A. V., and M. Y. Kosenko. 2016. Programmnye sredstva izvlecheniya informatsii iz tekstov na estestvennom yazyke [Software tools for information extraction from natural-language texts]. Al'manakh sovremennoy nauki i obrazovaniya [Almanac of Modern Science and Education] 12(114):87-92.
- Lapshin, V. A. 2005. Obzor evolyutsii algoritmov Erli i LR(k)-analizatorov [Overview of Earley algorithms and LR(k) parsers evolution]. Automatic Documentation Math. Linguistics 8:6-18.
- Pentus, A. E., and M. R. Pentus. 2006. Matematicheskaya teoriya formal'nykh yazykov [Mathematical theory of formal languages]. Moscow: INTUIT; BINOM. Knowledge Laboratory. 247 p.
- Novikov, O.I., N. G. Titov, and G.N. Titov. 2016. O numeratsiyakh konechnykh chastichno uporyadochennykh mnozhestv [On the numerations of the finite partially ordered sets]. Nauchnyy zhurnal KubGAU [Scientific J. KubSAU] 118(04). Available at: http://ej.kubagro.ru/2016/04/pdf/06.pdf (accessed March 19, 2018).
- Kolmogorov, A. N., and S. V. Fomin. 2004. Elementy teorii funktsiy i funktsional'nogo analiza [Elements of the theory of functions and functional analysis]. 7th ed. Moscow: Fizmatlit. 572 p.
[+] About this article
Title
LINEAR ORDERING OF THE RULES SET IN THE T-PARSER SYSTEM OF FACTS EXTRACTION
Journal
Systems and Means of Informatics
Volume 28, Issue 3, pp 217-226
Cover Date
2018-09-30
DOI
10.14357/08696527180317
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
facts extraction from texts; GLR-algorithm; pseudoorder; linear ordering; excluding cycles
Authors
I. M. Adamovich and O. I. Volkov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|