Biomedical Engineering Reference
In-Depth Information
Gene). More importantly, in this earlier version of the DW [25], we integrated clini-
cal data from questionnaires on 800 fields in a breast cancer study, covering demo-
graphics, clinical history, family history, lifestyle, risk factor, tissue type, tissue
diagnosis, and other annotations. We also integrated internally generated,
high-throughput experimental data. A multidimensional On-Line Analytical Pro-
cessing (OLAP) application and Spotfire application were also developed for data
access and visualization.
Currently, the new version of the DW is based on an innovative clinical data
module structure with an advanced, newly designed relational OLAP application
that is capable of enabling timely access to hundreds of clinical data elements from
up to a million subjects. In addition to the OLAP, a set of applications for data
retrieval, visualization, analysis, and mining is also envisioned within the biomedi-
cal informatics portal structured on top of the DW.
Other kinds of data centralization efforts are presented as search engines; exam-
ples include Entrez of NCBI in the United States and the Sequence Retrieval System
(SRS) developed by EBI in Europe [26-28]. Entrez contains more than 30 major
public databases that can be searched, and the SRS contains more than 130 biologi-
cal databases. SRS also integrates more than 10 applications. In both Entrez and
SRS, search results are returned in their original data formats from the source public
databases.
Table 8.1 lists example DWs for biological and biomedical studies, with refer-
ences and URL links. Some descriptions have been adapted from corresponding
websites. caCORE, the WRI DW, Entrez, and SRS are all listed, together with other
data repositories that are DW-like in nature.
Biomedical data have their own characteristics and the ways in which the data
are used are also very different from practices in the health care, retail, banking, and
other industries. A few papers have been published to discuss this unique environ-
ment and the special needs for data centralization [34-36]. While some opinions
presented in those papers are of value, other comments apply to specific data cen-
tralization environments, but may not be generalizable. For example, Koehler et al.
stated that “A biological data integration system should support direct import of
data from flat files rather than from separate database management systems” and
that “When individual data sources have to be updated, it is assumed that all data-
bases to be integrated are reimported and integrated” [34]. Based on our knowledge
and practice, we believe that data loading should support multiple data formats. In
addition, some of the source data types can accumulate to terabyte level and higher,
making it impractical to always perform complete loading. In such cases, data load-
ing will have to be incremental. We will discuss such issues in more detail in Section
8.3.
8.2
Types of Data in Question
As discussed in Chapter 1, translational biomedical informatics research takes data
from a broad range of clinical, genomic, and proteomic platforms as well as the
knowledge generated in these fields. Without such a broad range of data, the ques-
tions raised earlier in this chapter cannot be answered. Currently a systematic col-
 
Search WWH ::




Custom Search