Integrated Analysis of Gene Expression Profiling Studies — Examples in Breast Cancer - Bioinformatics: A Swiss Perspective

Biology Reference

In-Depth Information

ignored). The resulting - should be (approximately) distributed as stan-

dard normal ( N (0, 1)), and can be ranked according to size or p- value.

We note that this procedure is not necessarily optimal for every situ-

ation. For example, if all test statistics from the individual studies have a

common distribution that is not Gaussian, then a specialized procedure

which is more powerful can generally be constructed. In addition, we use

equal weighting for each study. It may instead be desired to use different

weights, based on sample size, variance, or quality measures, for instance.

However, optimal weights will depend on the alternatives of interest,

which may also vary between studies. We have found this equal weight

form to be quite useful for large-scale, automated combining and

exploratory analysis of studies, for which the models may be very

heterogeneous.

6.6. Multiple Testing

An issue in large-scale genomic studies is the multiplicity problem when

testing thousands of null hypotheses. Although often ignored in meta-

analyses of genomic data, adjustment of p -values is needed to provide a

realistic assessment of significance for each gene.

We carry out p -value adjustment based on the final combined statis-

tics - i . Among the most utilized adjustments are the family-wise error

rate (FWER)-controlling Bonferroni correction, not recommended

due to overly conservative p -values but is very quick to compute, and

FDR-adjusted p -values. 29 We typically adjust using either FDR or

max T , 17 another FWER-controlling adjustment which is less conserva-

tive than Bonferroni correction as it takes between-gene correlations

into account.

To compute max T , the joint null distribution of test statistics is esti-

mated by bootstrapping. Bootstrap replicates are obtained for each indi-

vidual study and then modeled and analyzed as described above,

resulting in a new set of combined z -scores, - i *. Then, for each bootstrap

replicate, the maximum value of - i * is chosen, yielding a null distribution

of max - *, from which the final adjusted p -values are obtained.

Search WWH ::

Custom Search

Home