Systems and Means of Informatics
2018, Volume 28, Issue 4, pp 145-155
ELEMENTS OF MACHINE LEARNING IN THE T-PARSER SYSTEM OF FACTS EXTRACTION
- I. M. Adamovich
- O. I. Volkov
Abstract
The article focuses on the further development of the system of facts automatic extraction from historical texts T-parser which is a component of the technology of historical and biographical research automation. The article outlines the ways to increase the parsing speed by using machine learning. The chosen forms of machine learning are described and reasoned and the possible problems are formulated. The classification of parsing bifurcations is given. The mechanism of filtering for the precedent database creation based on the methods of statistical quality control on an alternative basis is described and reasoned.
The description of the updated parsing algorithm and experimental verification of its effectiveness in comparison with the previous version carried out with real historical texts are adduced. The results of experiments which confirm high efficiency of the updated algorithm and its applicability to the technology of historical and biographical research automation are described. The technology is intended for a broad range of nonprofessional users, which is topical with regard to the increasing public interest to family history.
[+] References (13)
- Adamovich, I.M., and O. I. Volkov. 2016. Tekhnologiya raspredelennogo avtoma- tizirovannogo analiza istoricheskikh tekstov [The distributed automated technology of historical texts analysis]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(3):148-161. doi: 10.14357/08696527160311.
- Adamovich, I.M., and O. I. Volkov. 2015. Sistema izvlecheniya biograficheskikh faktov iz tekstov istoricheskoy napravlennosti [The system of facts extraction from historical texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(3):235-250. doi: 10.14357/08696527150315.
- Artemov, M. A., A.N. Vladimirov, and K. E. Seleznyov. 2013. Obzor sistem analiza estestvennogo teksta na russkom yazyke [Review on natural language analysis systems for Russian language]. Proceedings of Voronezh State University. Ser. Systems Analysis and Information Technologies 2:189-194.
- Budzko V.I., Yu. P. Kalinin, E.B. Kozerenko, A. A. Khoroshilov, and A. A. Khoroshilov. 2017. Mashinnaya grammatika russkogo yazyka [Machine grammar of the Russian language]. Sistemy vysokoy dostupnosti [Highly Available Systems] 13 (3): 19-33.
- Adamovich, I. M., and O.I. Volkov. 2014. Sredstva podderzhki internet-poiska pri provedenii biograficheskikh issledovaniy [The technology of Internet search as a part of biographic investigation]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 24(2):178-192. doi: 10.14357/08696527140212.
- Adamovich, I.M., and O.I. Volkov. 2016. Ierarkhicheskaya forma predstavleniya biograficheskogo fakta [Hierarchial format of biohraphical fact]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(2): 108-122. doi: 10.14357/08696527160207.
- Pogorelov, D. A., A.M. Tarzanov, and L. L. Volkova. 2017. Ot LR k GLR: obzor sintaksicheskikh analizatorov [From LR to GLR: Overview of parsers]. Novye informa- tsionnye tekhnologii v avtomatizirovannykh sistemakh [New Information Technologies Automated Systems] 20:245-250.
- Ageev, M.S., I.E. Kuralenok, and I. S. Nekrest'yanov. 2010. Ofitsial'nye metriki ROMIP 2010 [ROMIP'2010 official metrics]. Rossiyskiy seminar po otsenke metodov informatsionnogo poiska: Trudy ROMIP'2010 [ROMIP: Russian Information Retrieval Evaluation Seminar Proceedings]. Kazan: Kazan University Publs. 172-187.
- Adamovich, I. M., and O.I. Volkov. 2018. Lineynoe uporyadochivanie mnozhestva pravil v sisteme izvlecheniya biograficheskikh faktov T-parser [Linear ordering of rules set in the system of facts extraction T-parser]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 28(3):217-226. doi: 10.14357/08696527180317.
- Tel'nov, U. F. 2001. Intellektual'nye informatsionnye sistemy [Intelligent information systems]. Moscow: Moscow State University of Economics, Statistics, and Informatics, "Moscow Bank Higher School" Institute 118 p.
- Vetrov, D. P., and D. A. Kropotov. 2007. Bayesovskie metody mashinnogo obucheniya [Bayesian methods in machine learning]. Moscow: MGU Publs. 133 p.
- Bykov, Yu. M. 2002. Statisticheskiy priemochnyy kontrol' po al'ternativnomupriznaku [Statistical quality control on an alternative basis]. Volgograd: VSTU. 52 p.
- GOST R 50779.75-99. 2000. Statisticheskie metody. Posledovatel'nye plany vy- borochnogo kontrolya po al'ternativnomu priznaku [Statistical methods. Sequential sampling plans for inspection by attributes]. Moscow: Standardinform Publs. 45 p.
[+] About this article
Title
ELEMENTS OF MACHINE LEARNING IN THE T-PARSER SYSTEM OF FACTS EXTRACTION
Journal
Systems and Means of Informatics
Volume 28, Issue 4, pp 145-155
Cover Date
2018-11-30
DOI
10.14357/08696527180414
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
facts extraction from texts; machine learning; bifurcation; statistical quality control; training set
Authors
I. M. Adamovich and O. I. Volkov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science
and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|