Biology Reference
In-Depth Information
Table 2.
Publicly available breast cancer gene signature datasets.
Number of genes
Signature
Original
Mapped
symbol
Reference
Probes
to GeneID
ONC-16
46
16
16
NKI-70
30
70
52
EMC-76
32
60+16
48+12
NCH-70
40
70
69
CON-52
47
52
50
p53-32
33
32
19
CSR
48
512
457
IGS
49
—
—
GGI-128
44
128
98
CCYC
50
NA
126
7.1. Example I:Breast Cancer Survival
A Cox model of metastasis-free survival was fitted separately for each gene
(no covariates were included in this example). For individual
i
and gene
j
,
we model the instantaneous failure rate, or hazard function
h
(
t
), as a func-
tion of the expression level
x
ij
with the Cox proportional hazard model:
h
(
t
)
=
h
0
(
t
) exp(
β
j
x
ij
).
(2)
β
·
j
. These estimated coeffi-
cients are standardized and then combined across studies to obtain
-
j
.
Figure 1 shows scatter plots of single-gene
z
-scores (standardized
Cox model regression coefficients) from the two largest studies, NKI and
EMC. These scatter plots illustrate three rules for determining the
significance of combined results: combined
Z
, Fisher
p
-value combination,
and the Venn diagram rule requiring genes to be selected by both studies.
Equal-significance contours for each rule are given for four significance
levels (
For each gene, there is an estimated coefficient
0.01, 0.001, 0.0001, and 0.00001). One-sided tests for only
large values of the relevant statistic are depicted; the largest negative val-
ues are tested similarly.
α =