Systems and Means of Informatics
2022, Volume 32, Issue 4, pp 112-123
CLUSTERING ALGORITHMS FOR TECHNOLOGY OF CONCRETE HISTORICAL INVESTIGATION SUPPORT
- I. M. Adamovich
- O. I. Volkov
Abstract
The article continues the series of works devoted to the technology of concrete historical research support. The technology is based on the principles of co-creation and crowdsourcing and is designed for a wide range of users which are not professional historians and biographers. The article is devoted to the further development of the technology by integrating into it a mechanism for automated search for anomalies in concrete historical information based on cluster analysis.
The analysis of the specifics of concrete historical data and the ways of their representation in the object model of technology is carried out. The methods of mixed data digitizing and the proximity measures used for them are considered in detail and the advantages and disadvantages of clustering algorithms used to search for anomalies are evaluated. Based on the analysis, an approach was developed to search for anomalies in the data of technology and directions were outlined for testing the effectiveness of the selected algorithms and proximity measures on real concrete historical data.
[+] References (20)
- Gribach, S. V. 2010. Issledovanie semeynykh krizisov posredstvom psikholingvisticheskogo eksperimenta [The study of family crises through a psycholinguistic experiment]. Sborniki konferentsiy NITs Sotsiosfera [Conference Proceedings NIC Sociosfera] 6:45- 54.
- Adamovich, I. M., and O.I. Volkov. 2016. Tekhnologiya raspredelennogo avtomatizirovannogo analiza istoricheskikh tekstov [The distributed automated technology of historical texts analysis]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(3):148-161. doi: 10.14357/08696527160311.
- Adamovich, I. M., and O. I. Volkov. 2019. Edinaya tekhnologiya podderzhki konkretno-istoricheskikh issledovaniy [Unified technology of concrete historical research support]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(1):194-205. doi: 10.14357/08696527190116.
- Adamovich, I. M., and O. I. Volkov. 2019. Printsipy organizatsii dannykh dlya tekhnologii podderzhki konkretno-istoricheskikh issledovaniy [The principles of data organization for the technology of concrete historical research support]. Sistemy
i Sredstva Informatiki - Systems and Means of Informatics 29(2):161-171. doi: 10.14357/08696527190214.
- Adamovich, I. M., and O. I. Volkov. 2016. Ierarkhicheskaya forma predstavleniya biograficheskogo fakta [Hierarchial format of a biographical fact]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(2): 108-122. doi: 10.14357/ 08696527160207.
- Adamovich, I. M., and O.I. Volkov. 2020. Avtomatizirovannyy poisk protivorechiy v konkretno-istoricheskoy informatsii [Automated search for contradictions in concrete- historical information]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 30(3):145-153. doi: 10.14357/08696527200313.
- Adamovich, I. M., and O.I. Volkov. 2022. Podkhod k poisku anomaliy v konkretno- istoricheskikh dannykh [An approach to searching for anomalies in concrete-historical data]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 32(3):136- 146. doi: 10.14357/08696527220313.
- Adamovich, I.M., and O.I. Volkov. 2015. Sistema izvlecheniya biograficheskikh faktov iz tekstov istoricheskoy napravlennosti [The system of facts extraction from historical texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 25(3):235-250. doi: 10.14357/08696527150315.
- Ikonnikova, S.N. 2012. Biografika kak chast' istoricheskoy kul'turologii [Biografical studies as part of the historical cultural studies]. Vestnik SPbGUKI [Herald of the St. Petersburg State University of Culture and Art] 2(11): 6- 10.
- Koval'chenko, I. D. 2003 Metody istoricheskogo issledovaniya [Methods of historical research]. Moscow: Nauka. 486 p.
- Bantikova, O.I., E.N. Sedova, and O. S. Chudinova. 2011. Metody klasternogo analiza. Klassifikatsiya bez obucheniya (neparametricheskiy sluchay) [Methods of cluster analysis. Classification without training (nonparametric case)]. Orenburg: GOU OGU. 91 p.
- Sin'kov, D.V., and A.D. Vanichkin. 2020. Kodirovanie kategorial'nykh dannykh dlya ispol'zovaniya v mashinnom obuchenii [Categorical data coding for use in machine learning]. Molodoy uchenyy [Young Scientist] 21(311):70-72.
- Lasaosa, J. M. 2021. Hands-on tutorials. Clustering on numerical and categorical features. Using Gower distance in Python. Towards Data Science. Available at: https: //towardsdatascience.com/clustering-on-numerical-and-categorical-features- 6e0ebcf1cbad (accessed August 31, 2022).
- Sulc, Z., J. Prochazka, and M. Matejka. 2016. Modifications of the Gower similarity coefficient. 19th Conference of Applications of Mathematics and Statistics in Economics Proceedings. Banska Bystrica, Slovakia: MatejBel University. 369-377.
- Huang, Z. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical variables. Data Min. Knowl. Disc. 2:283-304.
- Zagoruiko, N. G. 1999. Prikladnye metody analiza dannykh i znaniy [Applied methods of data and knowledge analysis]. Novosibirsk: Institute of Mathematics Publs. 270 p.
- Kozhevnikova, I. S., E.V. Anan'in, and A.V. Lysenko. 2016. Nekontroliruyemye metody mashinnogo obucheniya pri obnaruzhenii setevykh anomaliy [Unsupervised machine learning methods for detecting network anomalies]. Molodoy uchenyy [Young Scientist] 30(134):30-33.
- Kovalev, S. P. 2019. Ispol'zovanie algoritma klasterizatsii DBSCAN dlya fil'tratsii vybrosov v dannykh [Using the DBSCAN clustering algorithm for filtering outliers in data]. Komp'yuternye sistemy i seti: 55-ya Yubileynaya nauchnaya konferentsiya aspi- rantov, magistrantov i studentov [55th Anniversary Scientific Conference of Graduate Students, Undergraduates and Students "Computer Systems and Networks" Proceedings], Minsk: BSUIR. 198-200.
- Kislyakov, A.N., and S.V. Polyakov. 2020. Ierarkhicheskie metody klasterizatsii v zadache poiska anomal'nykh nablyudeniy na osnove grupp s narushennoy simmetriey [Hierarchical clustering methods in a task to find abnormal observations based on groups with broken symmetry]. Upravlencheskoe konsul'tirovanie [Administrative Consulting] 5:116-127.
- Alsova, O.K., and K. S. Uskova. 2013. Programmnaya sistema klasternogo analiza dannykh smeshannogo tipa [Software system cluster analysis of mixed data types]. Avtomatika i programmnaya inzheneriya [Automatics & Software Enginery] 1(3): 75-81.
[+] About this article
Title
CLUSTERING ALGORITHMS FOR TECHNOLOGY OF CONCRETE HISTORICAL INVESTIGATION SUPPORT
Journal
Systems and Means of Informatics
Volume 32, Issue 4, pp 112-123
Cover Date
2022-30-11
DOI
10.14357/08696527220411
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
concrete historical investigation; distributed technology; anomaly; historical-biographical fact; clustering
Authors
I. M. Adamovich and O. I. Volkov
Author Affiliations
Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|