Systems and Means of Informatics
2024, Volume 34, Issue 2, pp 123-133
METHOD FOR SEARCHING FOR OPTIMAL PARAMETER VALUES OF THE ENTITY RESOLUTION ALGORITHM FOR CONCRETE HISTORICAL DATA
- I. M. Adamovich
- O. I. Volkov
Abstract
The article is devoted to the use of the collective entity resolution method based on a new relational clustering algorithm, which is a modification of the greedy agglomerative clustering algorithm, in concrete historical investigation when processing nominative sources. The article proposes the method for searching for optimal values of parameters of the collective entity resolution algorithm for tasks related to concrete historical investigation. The method is based on the analysis of the specifics of concrete historical data, their comparison with test data for which there are estimates of the effectiveness of the algorithm, and the procedure for finding the optimal process parameters according to the Gauss-Seidel scheme that consists in sequentially searching for the function optimum alternately for each variable. The application of the proposed method makes it possible to use the considered entity resolution algorithm in real concrete historical research in the tasks of automated record linkage in nominative sources.
[+] References (10)
- Adamovich, I. M., and O.I. Volkov. 2024. Kollektivnoe razreshenie sushchnostey v tekhnologii podderzhki konkretno-istoricheskikh issledovaniy [Collective entity resolution in technology of concrete historical investigation support]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 34(1): 128-138. doi: 10.14357/08696527240111. EDN: GQPGXQ.
- Bhattacharya, I., and L. Getoor. 2007. Collective entity resolution in relational data. ACMT. Knowl. Discov. D. 1(1):5. 36 p. doi: 10.1145/1217299.1217304.
- Adamovich, I. M., and O.I. Volkov. 2016. Tekhnologiya raspredelennogo avtomatizirovannogo analiza istoricheskikh tekstov [The distributed automated technology of historical texts analysis]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 26(3): 148-161. doi: 10.14357/08696527160311. EDN: WWSZIJ.
- Adamovich, I. M., and O. I. Volkov. 2019. Edinaya tekhnologiya podderzhki konkretno-istoricheskikh issledovaniy [Unified technology of concrete historical research support]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 29(1): 194-205. doi: 10.14357/08696527190116. EDN: MZLQGZ.
- Vovchenko, A.E., L. A. Kalinichenko, and D.Yu. Kovalev. 2014. Metody razresheniya sushchnostey i sliyaniya dannykh v ETL-protsesse i ikh realizatsiya v srede Hadoop [Methods of entity resolution and data fusion in the ETL-process and their implementation in the Hadoop environment]. Informatika i ee primeneniya - Inform. Appl. 8(4):94-109. doi: 10.14357/19922264140412. EDN: PJYYLB.
- Strehl, A., J. Ghosh, and R. Mooney. 2000. Impact of similarity measures on web-page clustering. Workshop of Artificial Intellegence for Web Search. AAAI Press. 58-64.
- Collins-Elliot, S. A. 2016. Agglomerative clustering using cosine and Jaccard distances: A computational approach to Roman vessel taxonomy. Archeologia Calcolatori 27: 7-25.
- Antonov, D. N. 2000. Vosstanovlenie istorii semey: metod, istochniki, analiz [Restoring family history: Method, sources, and analysis]. Moscow. PhD Diss. 290 p. EDN: QDBKMR.
- Thorvaldsen, G. 2016. Nominativnye istochniki v kontekste vsemirnoy istorii perepisey: Rossiya i Zapad [Nominative data and global census history: Russia and the West]. Izvestiya Ural'skogo federal'nogo universiteta. Ser. 2. Gumanitarnye nauki [Izvestia. Ural Federal University J. Ser. 2. Humanities and Arts] 18(3):9-28. doi: 10.15826/izv2.2016.18.3.041. EDN: WYDBXL.
- Khnaev, O.A., and I. A. Pchelintsev. 2012. Parametricheskaya optimizatsiya sistem. Metody resheniya ekstremal'nykh zadach [Parametric optimization of systems. Methods for solving extremal problems]. Modeli, sistemy, seti v ekonomike, tekhnike, prirode i obshchestve [Models, Systems, Networks in Economics, Engineering, Nature and Society] 2(3): 146-152. EDN: RPUFKH.
[+] About this article
Title
METHOD FOR SEARCHING FOR OPTIMAL PARAMETER VALUES OF THE ENTITY RESOLUTION ALGORITHM FOR CONCRETE HISTORICAL DATA
Journal
Systems and Means of Informatics
Volume 34, Issue 2, pp 123-133
Cover Date
2024-05-20
DOI
10.14357/08696527240209
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
concrete historical investigation; distributed technology; entity resolution; algorithm parameters; relational similarity measure
Authors
I. M. Adamovich and O. I. Volkov
Author Affiliations
Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|