Biomedical Engineering Reference
In-Depth Information
￿ How many subjects have been enrolled in the study?
￿ Are there enough samples from the control and diseased subjects of matched
conditions?
What biomarkers can be used to tell cancer patients from patients with benign
diseases?
￿
Is this tumor a recurrence of a previous cancer or a new one?
￿
Is there a molecular basis supporting the pathology diagnoses of these dis-
eases?
￿
Which genotype profile can be predictive of sudden cardiac death?
￿
Are abnormally expressed genes at the RNA level also overexpressed or
underexpressed at the protein level?
￿
In our opinion, data centralization is not for the mechanical purpose of central-
ization, but for the purpose of addressing clinical and scientific questions. The for-
mer would simply be a response to the need to manage large amounts of data, but
the latter will ensure that a fundamentally useful data centralization structure is
designed. A good understanding of the data themselves is also a prerequisite of data
centralization.
In the preceding chapters, we have described and discussed how disease-centric
biomedical informatics data are generated, tracked, and controlled and assessed for
quality. These operations are certainly applied across the clinical, genomic, and
proteomic platforms. For integrative biomedical informatics research, data central-
ization is not only necessary for those typically in-house generated heterogeneous
data, but is also needed for the publicly available experimental and annotation data
that are complementary to internal data, to allow research clinicians and scientists
to effectively access the data of different types for both bottom-up and top-down
research. One way to centralize such data is to develop a data warehouse (DW).
Note that in this chapter the term data centralization is used in the place of the
more commonly used term data integration because later in the chapter when we
discuss DW models there is an “integration” approach and there is also a “federa-
tion” approach. Using the term data integration here may cause some confusion
when we are really referring to “putting data together in an organized manner,”
which could be done through integration or federation or both approaches. Thus,
the more neutral term of centralization is chosen here.
8.1
An Overview of Data Centralization
One important approach for centralizing data is through data warehousing. DW
technology has been applied to retail, banking, transportation, and other industrial
services for about 20 years. There are two widely respected pioneers in the field,
William H. Inmon and Ralph Kimball, who define DW from different perspectives.
Inmon defines a DW as “a subject-oriented, integrated, time-variant and non-vola-
tile collection of data in support of management's decision making process” [1],
whereas Kimball defines it as an integrated collection of several data marts [2]. Note
 
Search WWH ::




Custom Search