Informatics and Applications
2019, Volume 13, Issue 2, pp 117-125
VIRTUAL EXPERIMENTS IN DATA INTENSIVE RESEARCH
- D. Y. Kovalev
- E. A. Tarasov
Abstract
Organization and management of virtual experiments (VE) in data intensive research (DIR) has been widely studied in the several past years. The authors survey existing approaches to deal with VEs and hypotheses and analyze VE management in a real use-case from the astronomy domain. A review of existing systems that can act as platforms for conducting a VE has been carried out. Requirements for a system to organize VEs in data intensive domains have been gathered and overall structure and functionality for system running VEs are presented. The relationships between hypotheses and models in VE are discussed. The authors also illustrate how to model conceptually VEs, respective hypotheses, and models. Potential benefits and drawbacks of such approach are discussed, including maintenance of experiment consistency and reduction of potential number of experiments.
[+] References (23)
- Hey, T., S. Tansley, and K. Tolle. 2009. The fourth paradigm: Data-intensive scientific discovery. Redmond, WA: Microsoft Research. 284 p.
- Kalinichenko, L., D. Kovalev, D. Kovaleva, and
O. Malkov. 2015. Methods and tools for hypothesis-driven research support: A survey. Informatika i ee Primeneniya - Inform. Appl. 9(1):28-54.
- Tarasov, E., and D. Kovalev. 2017. Otsenka kachestva nauchnykh gipotez v virtual'nykh eksperimentakh v oblastyakh s intensivnym ispol'zovaniem dannykh [Estimation of scientific hypotheses quality in virtual experiments in data intensive domains]. CEUR Workshop Proceedings: 19th Conference (International) on Data Analytics and Management in Data Intensive Domains Selected Papers. 2022:272-278.
- Kovalev, D., L. Kalinichenko, and S. Stupnikov. 2017. Organization of virtual experiments in data-intensive domains: Hypotheses and workflow specification. CEUR Workshop Proceedings: 19th Conference (International) on Data Analytics and Management in Data Intensive Domains Selected Papers. 2022:293-300.
- Demchenko, Y., P. Grosso, C. Laat, and P. Membrey.
2013. Addressing big data issues in scientific data in-frastructure. Conference (International) on Collaboration Technologies and Systems. San Diego, CA: IEEE. 48-55.
- Porto, F, and B. Schulze. 2013. Data management for eScience in Brazil. Concurr. Comp. Pract. E. 25(16):2307- 2309.
- GoMalves, B., F Silva, and F Porto. 2014. Upsilon-DB: A system for data-driven hypothesis management and analytics. arXiv:1411.7419 [cs.DB]. 6 p.
- Kalinichenko, L., S. Stupnikov, A. Vovchenko, and D. Kovalev. 2014. Rule-based multi-dialect infrastructure for conceptual problem solving over heterogeneous distributed information resources. New trends in databases and information systems. Eds. B. Catania, T. Cerquitelli, S. Chiu- sano, et al. Advances in intelligent systems and computing ser. Springer. 241:61-68.
- Duggan, J., and M. Brodie. 2015. Hephaestus: Data reuse for accelerating scientific discovery. 7th Biennial Conference on Innovative Data Systems Research. Asilo- mar, CA. Paper 29. 12 p. Available at: http://users.eecs. northwestern.edu/~jennie/pubs/hephaestus_full.pdf (accessed June 11, 2019).
- Schales, D., X. Hu, J. Jang, et al. 2015. FCCE: Highly scalable distributed feature collection and correlation engine for low latency big data analytics. 31st Conference (International) on Data Engineering. Seoul, South Korea: IEEE. 1316-1327.
- Ly, D., and H. Lipson. 2012. Learning symbolic representations of hybrid dynamical systems. J. Mach. Learn. Res. 13:3585-3618.
- Tarasov, Å. 2016. Sokrashchenie chisla virtual'nykh eks- perimentov s pomoshch'u otsenki korrelyatsii parametrov vzaimodeystvuyushchikh gipotez [Reducing the number of virtual experiments by estimating the correlation parameters of interacting hypotheses]. CEUR Workshop Proceedings: 18th Conference (International) on Data Analytics and Management in Data Intensive Domains Selected Papers. 1752:272-278.
- Goncalves, B., and F Porto. 2015. Managing scientific hypotheses as data with support for predictive analytics. Comput. Sci. Eng. 17(5):35-43.
- Molina, L., L. Belanche, and A. Nebot. 2002. Feature selection algorithms: A survey and experimental evaluation. Conference (International) on Data Mining Proceedings. IEEE. 306-313.
- Williams, N., S. Zander, and G. Armitage. 2006. Aprelim- inary performance comparison of five machine learning algorithms for practical IP traffic flow classification. ACM SIGCOMMComp. Com. 36(5):5-16.
- Hall, M. 1999. Correlation-based feature selection for machine learning. Hamilton, New Zealand: Waikato University, Department of Computer Science. PhD Thesis.
- Porto, F, A. Moura, B. Goncalves, R. Costa, and S. Spac- capietra. 2012. A scientific hypothesis conceptual model. Advances in conceptual modeling. Eds. S. Castano, P. Vassiliadis, L. V. S. Lakshmanan, and M.-L. Lee. Lecture notes in computer science ser. Springer. 7518:101-110.
- Trees, H. 2013. Detection, estimation, and modulation the-ory. Part I: Detection, estimation, and linear modulation theory. 2nd ed. Hoboken, NJ: John Wiley & Sons. 1176 p.
- Rysak, A., G. Litak, and R. Mosdorf. 2016. Analysis of non-stationary signals by recurrence dissimilarity. Recurrence plots and their quantifications: Expanding horizons. Eds. C. L. Webber, Jr., C. Ioana, and N. Marwan. Springer proceedings in physics ser. Springer. 180:65-90.
- Brennen, C. 2005. Fundamentals of multiphase flow. New York, NY: Cambridge University Press. 410 p.
- Czekaj, M., A. Robin, F Figueras, X. Luri, and M. Haywood. 2014: The Besancon Galaxy Model renewed. Constraints on the local star formation history from Tycho data. Astron. Astrophys. 564:A102. 20 p.
- Robin, A., and M. Creze. 1986. Stellar populations in the Milky Way - a synthetic model. Astron. Astrophys. 157(1):71-90.
- Robin, A., C. Reyle, S. Derriere, and S. Picaud. 2003. A synthetic view on structure and evolution of the Milky Way. Astron. Astrophys. 409(2):523-540.
[+] About this article
Title
VIRTUAL EXPERIMENTS IN DATA INTENSIVE RESEARCH
Journal
Informatics and Applications
2019, Volume 13, Issue 2, pp 117-125
Cover Date
2019-06-30
DOI
10.14357/19922264190216
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
virtual experiment; hypothesis; conceptual modeling; data intensive research
Authors
D. Y. Kovalev and E. A. Tarasov
Author Affiliations
Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
|