Informatics and Applications
2015, Volume 9, Issue 1, pp 28-54
METHODS AND TOOLS FOR HYPOTHESIS-DRIVEN RESEARCH SUPPORT: A SURVEY
- L. Kalinichenko
- D. Kovalev
- D. Kovaleva
- O. Malkov
Abstract
Data intensive research (DIR) is being developed in frame of the new paradigmof research study known
as the Fourth paradigm, emphasizing an increasing role of observational, experimental, and computer simulated
data practically in all research domains. The principal goal of DIR is an extraction (inference) of knowledge from
data. The intention of this work is to make an overview of the existing approaches, methods, and infrastructures
of the data analysis in DIR accentuating the role of hypotheses in such process and efficient support of hypothesis
formation, evaluation, and selection in course of the natural phenomena modeling and experiments carrying out.
An introduction into various concepts, methods, and tools intended for effective organization of hypothesis-driven
experiments in DIR is presented.
[+] References (78)
- Hey, T., S. Tansley, and K. Tolle, eds. 2009. The Fourth
paradigm: Data-intensive scientific discovery. Redmond,
Microsoft Research. 252 p.
- McComas, W.F. 1998. The principal elements of the nature
of science: Dispelling the myths of science. Nature of
science in science education: Rationales and strategies. Ed.
W. F.McComas. Kluwer Academic Publs. 5370.
- Lakshmana Rao, J.R. 1998. Scientific Laws, Hypotheses
and Theories.Meanings Distinctions Reson. 3:6974.
- Poincare, H. 2012. The foundations of science: Science
and hypothesis, the value of science, science and method.
The Project Gutenberg EBook. No. 39713. 554 p. Available
at: http://www.gutenberg.org/Śles/39713/39713-
8.txt (accessed February 10, 2015).
- Bacon, F. 1952. The new organon. Great books of the
Western World. Vol. 30. The works of Francis Bacon. Ed.
R.M. Hutchins. Chicago: Encyclopedia Britannica, Inc.
107195.
- Menzies, T. 1996. Applications of abduction: Knowledgelevel
modeling. Int. J. Hum.-Comput. St. 45(3):305335.
- Haber, J. 2010. Research questions, hypotheses, and clinical
questions. Evolve resources for nursing research. 7th
ed. Elsevier. 2755.
- Popper,K. 2005. The logic of scientific discovery. London
New York: Routledge, Taylor & Francis. 545 p. Available
at: http://strangebeautiful.com/other-texts/popper-
logic-scientiŚc-discovery.pdf (accessed February 10,
2015).
- Kerlinger, F.N., and H.B. Lee. 1964. Foundations of behavioral
research: Educational and psychological inquiry.
New York: Holt, Rinehart and Winston. 739 p.
- Hempel, C.G. 1952. Fundamentals of concept formation
in empirical science. Int. Encyclopedia Unified Sci. 2(7).
Available at:http://www.iep.utm.edu/hempel/(accessed
February 10, 2015).
- Porto, F., and S. Spaccapietra. 2011. Data model for
scientific models and hypotheses. Evolution Conceptual
Modeling 6520:285305.
- Goncalves, B., and F. Porto. 2013. A lattice-theoretic approach
for representing and managing hypothesis-driven
research. 25th Conference (International) on Scientific and
Statistical Database Management (ACM) Proceedings. Baltimore.
41.
- Gonc alves, B., F. Porto, and A.M.C. Moura. 2012. On
the semantic engineering of scientific hypotheses as linked
data. 2ndWorkshop (International) on Linked Science Proceedings.
Boston.
- Woodward, J. 2011. Scientific explanation. The Stanford
Encyclopedia of Philosophy. Available at: http://plato.
stanford.edu/archives/win2011/entries/scientiŚc-
explanation/ (accessed February 10, 2015).
- Nickles, T., ed. 1980. Scientific discovery: Case studies.
Taylor & Francis. 501 p.
- Schickore, J. 2014. Scientific discovery. The Stanford
Encyclopedia of Philosophy. Available at: http://plato.
stanford.edu/archives/spr2014/entries/scientiŚc-
discovery/ (accessed February 10, 2015).
- Kakas,A.C., R.A. Kowalski, and F.Toni. 1993. Abductive
logic programming. J. Logic Comput. 2(6):719770.
- Kakas, A.C., A. Michael, and C. Mourlas. 2000. ACLP:
Abductive constraint logic programming. J. Logic Program.
44(1):129177.
- Van Nuffelen, B., and A. Kakas. 2001. A-system: Declarative
programming with abduction. Logic programming
and nonmotonic reasoning. Eds. T. Eiter, W. Faber, and
M. Truszczynski. Lecture notes in computer science ser.
BerlinHeidelberg: Springer. 2173:393397.
- Alferes, J. J., L.M. Pereira, and T. Swift. 2004. Abduction
in well-founded semantics and generalized stable models
via tabled dual programs. Theor. Pract. Log. Progr.
4(4):383428.
- Ray, O., and A. Kakas. 2006. ProLogICA: A practical
system for Abductive Logic Programming. 11thWorkshop
(International) on Non-Monotonic Reasoning Proceedings.
304312.
- Citrigno, S., T. Eiter, W. Faber, G. Gottlob, C. Koch,
N. Leone, and F. Scarcello. 1997. The dlv system: Model
generator and application frontends. 12th Workshop on
Logic Programming Proceedings. 128137.
- King, R.D., M. Liakata, C. Lu, S.G. Oliver, and
L.N. Soldatova. 2011. On the formalization and reuse
of scientific research. J. Roy. Soc. Interface 8(63):1440
1448.
- Tamaddoni-Nezhad, A., R. Chaleil, A. Kakas, and
S.H. Muggleton. 2006. Application of abductive ILP to
learningmetabolic network inhibition fromtemporal data.
Mach. Learn. 64:209230.
- Inoue K., T. Sato, M. Ishihata, Y. Kameya, and
H. Nabeshima. 2009. Evaluating abductive hypotheses
using and EM algorithm on BDDs. 21st Joint Conference
(International) on Artificial Intelligence (IJCAI09) Proceedings.
Pasadena. 810815.
- Bartha, P. 2013. Analogy and analogical reasoning.
The Stanford Encyclopedia of Philosophy. Available
at: http://plato.stanford.edu/archives/fall2013/entries/
reasoning-analogy/ (accessed February 10, 2015).
- Ivezic, Z., A. J. Connolly, J. T. VanderPlas, and A. Gray.
2014. Statistics, data mining, and machine learning in astronomy:
A practical Python guide for the analysis of survey
data. Princeton University Press. 552 p.
- Sivia,D. S., and J. Skilling. 2006. Data analysis. A Bayesian
tutorial. New York: Oxford University Press Inc. 264 p.
- Field, A. 2013. Discovering statistics using IBMSPSS statistics.
4th ed. Sage. 915 p.
- IBM SPSS Statistics for Windows, Version 22.0. 2013.
Armonk, N.Y.: IBM Corp. IBM SPSS Statistics base.
Available at: https://www.uio.no/tjenester/it/forskning/
statistikk/hjelp/programveilednigner/ibm spss
statistics brief guide-2.pdf (accessed February 10,
2015).
- Ihaka,R., and R.Gentleman. 1996. R:A language for data
analysis and graphics. J. Comput. Graph. Stat. 5(3):299
314.
- March,M.C., G.D. Starkman, R. Trotta, and P.M. Vaudrevange.
2011. Should we doubt the cosmological constant?
Mon. Not. Roy. Astron. Soc. 410(4):24882496.
- Rouder, J.N., P. L. Speckman, D. Sun, R.D.Morey, and
G. Iverson. 2009. Bayesian t tests for accepting and rejecting
the null hypothesis.Psychon.Bull.Rev. 16(2):225237.
- Weber,M. 2014. Experiment in biology. The Stanford Encyclopedia
of Philosophy. Available at: http://plato.
stanford.edu/archives/fall2014/entries/biology-
experiment/ (accessed February 10, 2015).
- Hawthorne, J. 2014. Inductive logic. The Stanford Encyclopedia
of Philosophy. Available at: http://plato.
stanford.edu/archives/sum2014/entries/logic-
inductive/ (accessed February 10, 2015).
- Breiman, L. 2001. Statistical modeling: The two cultures.
Stat. Sci. 16(3):199231.
- Hastie, T., R. Tibshirani, J. Friedman, and J. Franklin.
2005. The elements of statistical learning: Data mining,
inference and prediction.Math. Intell. 27(2):8385.
- Barber,D. 2010. Bayesian reasoning and machine learning.
Cambridge University Press. 720 p.
- Ferrucci, D., E. Brown, J. Chu-Carroll, J. Fan,
D. Gondek, A.A. Kalyanpur, and C. Welty. 2010. Building
Watson: An overview of the DeepQA project. AI Mag.
31(3):5979.
- Dredze, M., K. Crammer, and F. Pereira. 2008.
Confidence-weighted linear classification. 25th Conference
(International) on Machine Learning Proceedings.
Helsinki. 264271.
- Starkman,G.D., R. Trotta, and P.M. Vaudrevange. 2008.
Introducing doubt in Bayesian model comparison. arXiv
preprint arXiv:0811.2415.
- March, M.C. 2013. Advanced statistical methods for astrophysical
probes of cosmology. Springer Theses. Vol. 20.
177 p.
- Porto, F. 2013. Big data in astronomy. The LIneADEXL
case. EMC Summer School on BIG DATA
NCE/UFRJ. Available at: http://www.slideshare.net/
fabiomporto/emc-2013-big-data-in-astronomy (accessed
February 10, 2015).
- Racunas, S. A., N.H. Shah, I. Albert, and N. V. Fedoroff.
2004. Hybrow: A prototype system for computer-aided
hypothesis evaluation. Bioinformatics 20(1):257264.
- Soldatova, L.N., A.Rzhetsky, andR.D.King. 2011.Representation
of research hypotheses. J. Biomed. Semantics
2(S-2):S9.
- Callahan,A.,M.Duumontier, andN.Shah. 2011. HyQue:
Evaluating hypotheses using Semantic Web technologies.
J. Biomed. Semantics 2(S-2):S3.
- Gao, Y., J. Kinoshita, E. Wu, E. Miller, R. Lee,
A. Seaborne, and T. Clark. 2006. SWAN: A distributed
knowledge infrastructure for Alzheimer disease research.
J. Web Semant. 4(3):222228.
- King, R.D., K. E. Whelan, F.M. Jones, P.G. Reiser,
C.H. Bryant, S.H. Muggleton, and S.G. Oliver. 2004.
Functional genomic hypothesis generation and experimentation
by a robot scientist. Nature 427(6971):247
252.
- Porto, F., A.M.C. Moura, B. Gonc alves, R. Costa, and
S. A. Spaccapietra. 2012. A scientific hypothesis conceptual
model. Advances in conceptual modeling. Eds. S. Castano,
P. Vassiliadis, L. V. Lakshmanan, and M. Li Lee.
Lecture notes in computer science ser.BerlinHeidelberg:
Springer. 7518:101110.
- Porto, F., and A.M.C. Moura. 2011. Scientific hypothesis
database. Report. Available at: http://livroaberto.
ibict.br/bitstream/1/869/1/Scientific%20Hypothesis%
20Database.pdf (accessed February 10, 2015).
- Asgharbeygi, N., P. Langley, S. Bay, and K. Arrigo. 2006.
Inductive revision of quantitative process models. Ecol.
Model. 194(1):7079.
- Tran, N., C. Baral, V. J. Nagaraj, and L. Joshi. 2005.
Knowledge-based integrative framework for hypothesis
formation in biochemical networks. Data integration in
the life sciences. Eds. B. Ludascher and L. Raschid. Lecture
notes in computer science ser. BerlinHeidelberg:
Springer. 3615:121136.
- Sparkes, A., W. Aubrey, E. Byrne, A. Clare, M.N. Khan,
M. Liakata, and R.D. King. 2010. Towards Robot Scientists
for autonomous scientific discovery. Autom. Exp. 2(1).
Available at: http://www.aejournal.net/content/2/1/1
(accessed February 10, 2015).
- Castrillo, J. I., and S.G. Oliver, eds. 2011. Yeast systems
biology: Methods and protocols. Methods in molecular biology
ser. BerlinHeidelberg: Springer. Vol. 759. 549 p.
- Plotkin, G.D. 1970. A note on inductive generalization.
Mach. Intell. 5:153163.
- Huang, J., L. Antova, C. Koch, and D. Olteanu. 2009.
MayBMS: A probabilistic database management system.
2009 ACM SIGMOD Conference (International) on Management
of Data Proceedings. Rhode Island. 10711074.
- Robin, A., and M. Creze. 1986. Stellar populations in
the Milky Way a synthetic model. Astron. Astrophys.
157:7190.
- Robin, A. C., C. Reyle C., S. Derri ere, and S. Picaud. 2006.
A synthetic view on structure and evolution of the Milky
Way. arXiv preprint astro-ph/0401052.
- Czekaj, M.A., A.C. Robin, F. Figueras, X. Luri,
and M. Haywood. 2014. The Besanc on Galaxy model
renewed-I. Constraints on the local star formation history
from Tycho data. Astron. Astrophys. 564:A102.
- Czekaj, M.A. 2012. Galaxy evolution: A new version of
the Besanc onGalaxyModel constrained with Tycho data.
PhD Thesis. Barcelona: Universitet de Barcelona. 167 p.
- Martins, A.M.M. 2014. Statistical analysis of large scale
surveys for constraining theGalaxy evolution.PhDThesis.
Barcelona: Universitet de Barcelona. 221 p.
- Biswal, B. B.,M.Mennes, X.N. Zuo, S. Gohel, C. Kelly,
S.M. Smith, and C. Windischberger. 2010. Toward discovery
science of human brain function. Proc. Nat. Acad.
Sci. USA 107(10):47344739.
- Craddock, R.C., S. Jbabdi, C.G. Yan, J. T. Vogelstein,
F. X. Castellanos, A. Di Martino, and M. P. Milham.
2013. Imaging human connectomes at the macroscale.
Nat. Methods 10(6):524539.
- Ginestet, C. E., P. Balanchandran, S. Rosenberg, and
E.D. Kolaczyk. 2014. Hypothesis testing for network
data in functional neuroimaging. arXiv preprint
arXiv:1407.5525.
- Ginestet, C.E., A. P. Fournel, and A. Simmons. 2014.
Statistical network analysis for functional MRI: Summary
networks and group comparisons. Front. Comput.
Neurosci. 8:51. Available at: http://www.ncbi.nlm.
nih.gov/pmc/articles/PMC4018548/ (accessed February
10, 2015).
- Yan, C.G., R.C. Craddock, X.N. Zuo, Y. F. Zang, and
M. P. Milham. 2013. Standardizing the intrinsic bra towards
robustmeasurement of inter-individual variation in
1000 functional connectomes. Neuroimage 80:246262.
- Marcus, D. S., J. Harwel, T. Olsen, M. Hodge,
M. F. Glasser, F. Prior, and D.C. Van Essen. 2011.
Informatics and data mining tools and strategies
for the human connectome project. Front. Neuroinform.
5. Available at: http://www.ncbi.nlm.nih.gov/pmc/
articles/PMC3127103/ (accessed February 10, 2015).
- Marcus, D. S., T.R. Olsen, M. Ramaratnam, and
R. L. Buckner. 2007. The extensible neuroimaging archive
toolkit. Neuroinformatics 5(1):1133.
- Brun, A. 2006. Manifold learning and representations for
image analysis and visualization. Department of Biomedical
Engineering, Linkopings Universitet. 104 p.
- Mahmoudi, A., S. Takerkart, F. Regragui, D. Boussaoud,
and A. Brovelli. 2012. Multivoxel pattern analysis
for fMRI data: A review. Comput. Math. Methods
Med. Available at: http://www.hindawi.com/journals/
cmmm/2012/961257/ (accessed February 10, 2015).
- Van Horn, J.D., and A.W. Toga. 2014. Human neuroimaging
as a BigData science.Brain Imaging Behavior
8(2):323331.
- Hillebrandt, H., K. J. Friston, and S. J. Blakemore.
2014. Effective connectivity during animacy perceptiondynamic
causalmodelling ofHumanConnectomeProject
data. Sci. Rep. 4. Available at: http://www.ncbi.nlm.
nih.gov/pmc/articles/PMC4150124/ (accessed February
10, 2016).
- Lappalainen, J., M.A. Sicilia, and B. Hernandez. 2013.
Automatic hypothesis checking using eScience Research
Infrastructures, ontologies, and linked data:Acase study in
climate change research. Procedia Comput. Sci. 18:1172
1178.
- Lenten, L. J., and I. A. Moosa. 2003. An empirical investigation
into long-term climate change in Australia.
Environ. Modell. Softw. 18(1):5970.
- Borges,M.R. 2010. Efficient market hypothesis in European
stock markets. Eur. J. Financ. 16(7):711726.
- Bollen, J., H. Mao, and X. Zeng. 2011. Twitter mood
predicts the stock market. J. Comput. Sci. 2(1):18.
- Spangler, S., A.D. Wilkins, B. J.Bachman, et al. 2014. Automated
hypothesis generation based on mining scientific
literature. KDD14 Proceedings. New York. 18771886.
- Zhou, D., O. Bousquet, T.N. Lal, J. Weston, and
B. Scholkopf. 2004. Learning with local and global consistency.
Adv. Neur. Inform. Proc. Syst. 16(16):321328.
[+] About this article
Title
METHODS AND TOOLS FOR HYPOTHESIS-DRIVEN RESEARCH SUPPORT: A SURVEY
Journal
Informatics and Applications
2015, Volume 9, Issue 1, pp 28-54
Cover Date
2014-10-30
DOI
10.14357/19922264150104
Print ISSN
1992-2264
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
data intensive research; Fourth paradigm; hypotheses; models; theories; hypothetico-deductivemethod;
hypothesis testing; hypothesis lattice; Galaxy model; connectome analysis; automated hypothesis generation
Authors
L. Kalinichenko , D. Kovalev , D. Kovaleva , and O.Malkov
Author Affiliations
Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
Institute of Astronomy, Russian Academy of Sciences, 48 Pyatnitskaya Str., Moscow 119017, Russian Federation
|