Informatics and Applications
2023, Volume 17, Issue 4, pp 42-47
AN EXTENSIBLE APPROACH TO DATA FUSION IN DISTRIBUTED COMPUTING ENVIRONMENTS
- V. V. Sazontev
- S. A. Stupnikov
- V. N. Zakharov
Abstract
The paper belongs to the area of development of methods and tools for data integration. One of the most important stages of data integration is data fusion, i.e., the combination of records relating to the same real-world entity into a single record with conflict resolution for each of the attributes. The paper considers the formal statement of the data fusion problem, provides a brief review of major groups of data fusion methods. An approach for implementation of the data fusion stage within an extensible heterogeneous data integration system in a distributed computing environment is proposed. Software architecture and basic implementation ideas of the approach are considered.
[+] References (7)
- Dong, X. L., and D. Srivastava. 2015. Big Data integration. Synthesis lectures on data management ser. Morgan & Claypool Publs. 178 p.
- Dong, X. L., and T Rekatsinas. 2018. Data integration and machine learning: A natural synergy. Conference (In-ternational) on Management of Data. New York, NY: ACM. 1645-1650. doi: 10.1145/3183713.3197387.
- Sazontev, V. V., and S. A. Stupnikov. 2023. An extensible approach to searching and selecting data sources for materialized Big Data integration in distributed computing environ-ments. Pattern Recognition Image Analysis 33(2):147-156. doi: 10.1134/S1054661823020141. EDN: YXUMDO.
- Sazontev, V. V. 2018. Methods for Big Data integration in distributed computation environments. CEUR Workshop Procee. 2277:238-244.
- Sazontev, V., and S. Stupnikov. 2019. An extensible approach for materialized Big Data integration in distributed computation environments. Ivannikov Memorial Workshop. IEEE. 33-38. doi: 10.1109/IVMEM.2019.00011. EDN: BWILRL.
- Bleiholder, J., and F. Naumann. 2008. Data fusion. ACM Comput. Surv. 41(1):1-41. doi: 10.1145/1456650.1456651.
- Rekatsinas, T, M. Joglekar, H. Garcia-Molina, A. Parameswaran, and C. Re. 2017. SLiMFast: Guaranteed results for data fusion and source reliability. Conference (International) on Management of Data Proceedings. New York, NY: ACM. 1399-1414. doi: 10.1145/3035918.3035951
[+] About this article
Title
AN EXTENSIBLE APPROACH TO DATA FUSION IN DISTRIBUTED COMPUTING ENVIRONMENTS
Journal
Informatics and Applications
2023, Volume 17, Issue 4, pp 42-47
Cover Date
2023-12-10
DOI
10.14357/19922264230406
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
data fusion; distributed computing environment
Authors
V. V. Sazontev , S. A. Stupnikov , and V. N. Zakharov
Author Affiliations
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|