Systems and Means of Informatics
2023, Volume 33, Issue 2, pp 132-141
APPLICATION OF THE CHAID ALGORITHM IN THE TECHNOLOGY OF CONCRETE HISTORICAL INVESTIGATION SUPPORT
- I. M. Adamovich
- O. I. Volkov
Abstract
The article continues the series of works devoted to the technology of concrete historical investigation support. The technology is based on the principles of co-creation and crowdsourcing and is designed for a wide range of users which are not professional historians and biographers. The article is devoted to the application of the decision tree method based on the CHAID algorithm to automatically fill information gaps in the set of historical facts in order to determine potentially promising areas of research. The algorithm is described and the reliability of its results with a high proportion of missing values in the data is evaluated. The proportion of lacunas in the main sources of multiple facts is estimated and the conclusion of the applicability in principle and the effectiveness of the algorithm is made taking into account the specifics of the technology. It is also shown that the CHAID algorithm develops and supplements the means of anomalies in concrete historical data detecting existing in the technology.
[+] References (17)
- Gribach, S. V. 2010. Issledovanie semeynykh krizisov posredstvom psikholingvisticheskogo eksperimenta [The study of family crises through a psycholinguistic experiment]. Sborniki konferentsiy NITs Sotsiosfera [Conference Proceedings NIC Sociosfera] 6:45-54.
- Adamovich, I.M., and O.I. Volkov. 2016. Tekhnologiya raspredelennogo avtomatizirovannogo analiza istoricheskikh tekstov [The distributed automated technology of historical texts analysis]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(3):148-161. doi: 10.14357/08696527160311.
- Adamovich, I. M., and O. I. Volkov. 2019. Edinaya tekhnologiya podderzhki konkretno-istoricheskikh issledovaniy [Unified technology of concrete historical research support]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(1): 194-205. doi: 10.14357/08696527190116.
- Adamovich, I. M., and O. I. Volkov. 2019. Printsipy organizatsii dannykh dlya tekhnologii podderzhki konkretno-istoricheskikh issledovaniy [The principles of data organization for the technology of concrete historical research support]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(2): 161-171. doi: 10.14357/ 08696527190214.
- Adamovich, I. M., and O. I. Volkov. 2016. Ierarkhicheskaya forma predstavleniya biograficheskogo fakta [Hierarchial format of a biographical fact]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(2): 108-122. doi: 10.14357/ 08696527160207.
- Adamovich, I. M., and O. I. Volkov. 2023. Mekhanizm formirovaniya gipotez v tekhnologii podderzhki konkretno-istoricheskikh issledovaniy [Hypothesis formation mechanism in the technology of concrete historical investigation support]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 33(1):135-145. doi: 10.14357/ 08696527230113.
- Fomina, E. E. 2019. Obzor metodov i programmnogo obespecheniya dlya vosstanovleniya propushchennykh znacheniy v massivakh sotsiologicheskikh dannykh [Review of software and methods for recovering missing values in sociological data sets]. Humanities Bulletin BMSTU 4(78). 12 p. doi: 10.18698/2306-8477-2019-4-611.
- Bova, A. A. 2002. Derev'ya resheniy kak tekhnika dobychi dannykh [Decision trees as a data mining technique]. Sotsiologiya: Teoriya, metody, marketing [Sociology: Theory, methods, and marketing] 1:128-136.
- Zhuchkova, S. V., and A.N. Rotmistrov. 2019. Poisk mnogomernoy svyazi kategorial'nykh priznakov: Sravnenie CHAID, loglineynogo analiza i mnozhestvennogo analiza sootvetstviy [In search of multivariate associations: Comparison of CHAID, log-linear analysis, and multiple correspondence analysis]. Monitoring obshchestvennogo mneniya: ekonomicheskie i sotsial'nye peremeny [Monitoring of Public Opinion: Economic and Social Changes] 2(150):32-53. doi: 10.14515/monitoring.2019.2.02.
- Zhuchkova, S. V., and A.N. Rotmistrov. 2018. Vozmozhnost' raboty s propushchennymi dannymi pri ispol'zovanii CHAID: Rezul'taty statisticheskogo eksperimenta [Handling missing data with CHAID: results of a statistical experiment]. Sotsiologiya: metodologiya, metody, matematicheskoe modelirovanie [Sociology: Methodology, Methods, Mathematical Modeling] 46:85-122.
- Adamovich, I.M., and O. I. Volkov. 2015. Sistema izvlecheniya biograficheskikh faktov iz tekstov istoricheskoy napravlennosti [The system of facts extraction from historical texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(3):235-250. doi: 10.14357/08696527150315.
- Pashinin, A.V. 2012. Metricheskie knigi kak istochnik sostavleniya rodoslovnykh [Metric books as a source of pedigree compilation]. Vlast' [J. Power] 11:198-201.
- Vinnik, M. V. 2012. Metricheskie knigi kak istochnik po istorii naseleniya Rossii [Registers of births as a source for the history of the Russian population]. Demoskop Weekly [Demoscope Weekly] 535-536. Available at: http://www.demoscope.ru/ weekly/2012/0535/analit012.php (accessed March 27, 2023).
- Antonov, D. N., and I. A. Antonova. 2006. Metricheskie knigi Rossii XVIII - nachala XX v. [Registers of births in Russia in the 18th-early 20th century]. Moscow: RGGU. 384 p.
- Adamovich, I. M., and O. I. Volkov. 2022. Podkhod k poisku anomaliy v konkretno- istoricheskikh dannykh [An approach to searching for anomalies in concrete-historical data]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 32(3):136- 146. doi: 10.14357/08696527220313.
- Adamovich, I.M., and O.I. Volkov. 2022. Algoritmy klasterizatsii dlya tekhnologii podderzhki konkretno-istoricheskikh issledovaniy [Clustering algorithms for technology of concrete historical investigation support]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 32(4): 112-123. doi: 10.14357/08696527220411.
- Gruzdev, A. V. 2018. Prognoznoe modelirovanie v IBM SPSS Statistics, R i Python. Metod derev'ev resheniy i sluchaynyy les [Predictive modeling in IBM SPSS Statistics, R, and Python. Decision tree method and random forest]. Moscow: DMK Press. 642 p.
[+] About this article
Title
APPLICATION OF THE CHAID ALGORITHM IN THE TECHNOLOGY OF CONCRETE HISTORICAL INVESTIGATION SUPPORT
Journal
Systems and Means of Informatics
Volume 33, Issue 2, pp 132-141
Cover Date
2023-06-10
DOI
10.14357/08696527230213
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
concrete historical investigation; distributed technology; CHAID algorithm; missing data; anomalies
Authors
I. M. Adamovich and O. I. Volkov
Author Affiliations
Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|