Information Technology Reference
In-Depth Information
hypotheses relevant to a given data set are often dispersed across a myriad of
resources such as the domain literature, formal ontologies, and public data sets or
models [ 19 ]. At the current time, tools and applications that allow researchers to
access and extract knowledge from domain-specifi c sources, and then use those
resulting knowledge extracts to inform “high throughput” hypothesis generation,
remain relative immature [ 17 , 19 ]. As a result, signifi cant additional effort is needed
to design and validate such tools and provide them for regular use by the scientifi c
community. Again, as was the case with the preceding problem area, and when
taken as a whole, the types of motivating questions one might encounter relative to
using knowledge-anchored methods to discover and test hypotheses concerning
linkages between phenotypic and bio-molecular variables in large-scale or “big
data” constructs can include:
1 . What are all of the research questions I could ask regarding my data collection ?
2 . Based upon the contents of the current biomedical literature , are there interest-
ing associations between data elements in my research data set that I should be
exploring ?
3 . Can I augment a research data set with linked , open data so that I can test com-
plex or otherwise intractable hypotheses ?
6.3.3
The Provision of Systematic and Extensible Data-
Analytic Pipelining Platforms
Often, BMI tools and methods are employed in the CTR setting in order to provide
for systematic data-analytic “pipelining” platforms that are capable of supporting
the defi nition and reuse of data analysis workfl ows incorporating multiple source
data sets, intermediate data analysis steps and products, and output types [ 22 , 23 ].
The value of such data analysis pipelines are many, including: (1) support for the
rapid execution of complex data analysis plans that would otherwise require time-
and resource-intensive manual multi-step processes to transact, manipulate, and
analyze data sets; and (2) enable the collection of information concerning the data
analysis methods being used. In the case of the latter benefi t, such information can
be utilized to better understand the outcomes of such analyses, and to ensure repro-
ducible results and high data quality through the documentation of all intermediate
analytical processes and products [ 22 , 23 ]. While these type of tools remain some-
what early in their development, their potential benefi ts are already being demon-
strated in the computational biology, bioinformatics, and translational bioinformatics
domains, where they have been used to enable the high-throughput and reproduc-
ible analyses of large amounts of multi-dimensional bio-molecular instrumentation
data [ 22 , 24 - 26 ]. Emergent efforts are similarly exploring their applicability to the
integrative analysis of clinical phenotype data in combination with such bio-
molecular data, in order to achieve translational end-points. Repeating the assess-
ments in the preceding problem areas, and when taken as a whole, the types of
Search WWH ::




Custom Search