Biomedical Engineering Reference
In-Depth Information
diseases mentioned above used a method of Marchini et al . [46]. This imputation method
uses HapMap LD and estimates of fine-scale recombination across the genome to obtain
probabilities for each possible genotype call at an untyped locus and takes the uncertainty
of imputed genotypes into account in tests for association at the locus. Several other impu-
tation programs are also now available [47 - 50]. Whatever software is used, it is important
to choose an appropriate external reference population as the source of LD information.
Recent research suggests that, in European-descent populations, common SNPs can be reli-
ably imputed using the single CEU HapMap reference panel, while mixtures of at least two
HapMap panels produce the highest imputation accuracy for most other populations [51].
Accurate imputation at rare SNPs will likely require larger reference samples beyond what
is currently available from HapMap.
Gene - gene interaction analysis is also of great importance for complex disease studies. It
is widely hypothesized that complex diseases arise in part because of such epistatic effects.
Specific interactions between SNP loci may be readily tested within the logistic regres-
sion framework described above, using a standard product term for the interaction. Other
methods for testing interactions include the multidimensional reduction (MDR) method
[52 - 54], an extension of MDR to allow covariates [55] and the recursive partitioning
method (RPM) [56, 57].
Multiple testing
Multiple testing and potential inflation of false positive rates are not new concerns for
statistical genetics and gene mapping. Nevertheless, with the advent of large-scale, GWAS
designs, the problem of potential false positives can seem especially pressing. The traditional
statistical significance level of 0.05 is certainly too permissive if applied to each test without
correcting for the number of tests, and yet, in contrast, a 0.05 experiment-wide error rate
may seem overly conservative when investigators have invested considerable resources into
generating the data.
The experiment-wide error rate may be estimated using permutation to generate an
empirical p -value; similarly, Bonferroni-style corrections that account for correlations (LD)
between SNPs [58, 59] may be applied to obtain an experiment-wide significance level. An
important alternative approach to determine which SNPs to label as 'findings worth follow-
ing up' is based on controlling the false discovery rate (FDR). Unlike the significance level,
which is the proportion of results which would be declared positive if the null hypothesis
were true, the FDR is the proportion of false positive results among all the declared positives.
The FDR is thus arguably more relevant than the significance level for studies in which
resources will be invested in following up the declared positive results. A low FDR would
thus improve expectations that most of that follow-up effort will 'pay off.' Publicly avail-
able programs to calculate FDRs include QVALUE (http://faculty.washington.edu/jstorey/
qvalue) [60].
Power
An integral part of planning a genetic mapping study is determining (estimating) the required
sample size to be able to detect a given effect. Power of an experimental test at a given
significance level is defined as the probability of rejecting the null hypothesis at the specified
significance level α when that null hypothesis is false.
For the simple chi-square test for
allele frequency differences between cases and
controls,
power
(or
necessary
sample
sizes
for
a
desired
level
of
power)
may
be
Search WWH ::




Custom Search