Biomedical Engineering Reference
In-Depth Information
informatics scientists for integrative studies, or by all of the above? Such require-
ments will not only govern the design of the reporting and query tools, but will also
directly or indirectly affect the data source selection, the size of the DW, hardware
and software selection, the data loading requirements, and so forth. Therefore, the
DW requirements should also include a high-level description of those major issues.
In addition, the available resources and limitations of the organization should be
incorporated, together with, very importantly, the timeline for the development.
In developing a DW, the developer organization and the user organization have
different interests and therefore different priorities. The developers hope to gather
the requirements, carve them in stone, develop the DW, and deliver the product. For
the developer organization, more data sources typically means more work and
therefore more profit. The users, on the other hand, have the ultimate goal of creat-
ing a useful and flexible DW. In reality, most users typically have little or no experi-
ence in developing a DW, and often they have different opinions among themselves
regarding how the DW should be developed. They may, therefore, present a long
wish list for developers to work on, which of course changes during the course of
the development.
There is a saying that data warehousing is risky, and the estimated failure rate
ranges from 5% to 50% to 90%, or 10% to 90% depending on which criteria are
used to define success and failure [76-78]. In the life sciences, the situation could be
worse given the dynamic nature of these sciences and the fact that data warehousing
technology has been applied to these fields only recently. One reason for the failure
might be that the project drags on for so long that yesterday's design of the structure
cannot meet the needs of today's new technology and the project can never be
closed. To guard against this possibility, the budget should be preset on a per-pro-
ject basis, and payments should be associated with milestone achievements follow-
ing the timeline, with delay punishment clauses against both sides. The budget
should never be time based or open ended. All of these factors should be considered
when documenting the DW requirements and preparing the contract.
8.3.3 Data Source Selection
The goal of the DW determines what kind of data should be included. Data source
selection directly determines how the DW should be developed. It determines the
size, the complexity, and the structural model of the DW, which in turn affects hard-
ware selection, the complexity of the loading scripts, the performance of the system,
and eventually the usage of the DW.
Data source selection not only asks the question of which data source should be
selected, but also which data elements should be selected from each data source. For
one example, for microarray experimental data, are you going to be compliant with
the MIAME requirements? Do you have additional specific needs? As another
example, if you decide to integrate the data from the public database UniProt, if you
are not doing sequence analysis do you need to include the protein sequences? If you
are not integrating other public databases do you need all of the database
cross-references?
From a practical point of view, a biomedical informatics DW should store the
data of a results nature, not the data for tracking purposes. The data fields for qual-
 
Search WWH ::




Custom Search