Biology Reference
In-Depth Information
7. Breast Cancer Examples
We present two applications to breast cancer gene expression data inte-
gration, focusing on methodology rather than biological interpretation.
We collected publicly available breast cancer survival datasets from
repositories such as GEO and ArrayExpress, as well as from journal arti-
cles, selecting those produced on whole-genome microarrays with
medium to large sample sizes (Tables 1 and 2). Small numbers of
nonmalignant samples (normal breast tissue or fibroadenoma) were
present in some datasets. Almost all malignant tumors were invasive
ductal carcinoma.
Since multiple publications sometimes reuse the same patients, we
created datasets with unique patients by merging some publication-based
datasets or removing redundant patients. We used processed gene expres-
sion values (log 2 expression or ratio) as provided by the original studies
without further normalization.
Hybridization probes were remapped to Entrez GeneID 51 through
sequence alignment against the well-curated subset of the RefSeq mRNA
sequence database. This mapping procedure is conservative, but it
ensures high-quality cross-platform matching. Within a study, multiple
probes of the same genes were made unique by choosing the most vari-
able probe across samples to represent the gene.
Only 1963 genes were present in all platforms. To avoid discarding
useful information about many genes, we performed meta-analyses on
the union of all 17 198 genes. Summary statistics of absent genes were
considered as missing values.
Pooling patients from heterogeneous datasets to treat them as if they
were from a single cohort may result in false associations. Therefore, we
stratified all analyses by dataset and combined only summary statistics
(such as z -scores of regression models). 7 This approach also circumvents
the problem of combining potentially incommensurable expression
measures from different microarray datasets. The z -scores are not
affected by arbitrary shifting or scaling of the expression data matrix of
each dataset.
Search WWH ::




Custom Search