Biology Reference
In-Depth Information
technologies. Resolving the dependencies by merging article-ori-
ented datasets or eliminating duplicated patients can be tedious
and unnecessarily burden downstream statistical meta-analysis.
(2)
No standard variable names or representation of values . The same
name may be used in different studies to mean different things
(e.g. “survival” may mean overall time until death or recurrence-free
time), or different names may be used to refer to the same entities. 6 In
addition, there is a need for better documentation of the technologies
used in deriving measured values. For example, several methods exist
to determine estrogen receptor (ER) status, including ligand binding
assay, immunohistochemistry, reverse transcription-polymerase chain
reaction (RT-PCR), and microarray; it may not make sense to consider
all methods as equivalent across studies. This example also highlights
the need for a hierarchy of variables. In this example, ER status is at a
higher level than the technology used to obtain it.
(3)
Difficulty of maintaining a consistent mapping of probes to genes .
This is essential for cross-platform matching based on genes. Since
the transcriptome is still continually being updated, it must remain
possible to map probes using up-to-date information sources.
(4)
Selective inclusion of information . Some data warehouses tend to
concentrate on specific platforms or repositories. Because tumor
samples are nonrenewable, it is important to include all data ema-
nating from them. This includes older samples and RT-PCR data,
and not just the newest data obtained from a specific type of
microarray.
(5)
Unclear or differing study design and patient selection criteria . Most
breast cancer expression data generated to date are based on samples
obtained from tumor banks (i.e. population-based sampling). More
recent studies may be based on patients selected for clinical trials,
implying completely different inclusion/exclusion criteria. Combining
studies of selected patients with population-based studies may result
in biased or uninterpretable results. Furthermore, some datasets also
contain multiple arrays per patient in order to yield longitudinal
information on tumor progression and metastasis or chemotherapy
response. This possibility implies a hierarchy of samples, analogous to
Search WWH ::




Custom Search