Information Technology Reference
In-Depth Information
The comparison of R 2 and simple linear regression for Chromosome 1 is
summarized in Table 1. A finer categorization of the LD than that used for the
scatter plots in Figure 1 is used in Table 1. Results for other chromosomes are
similar.
Table 1 Comparison of p-values between imputed (Y) and actual (X) allele frequencies of
3000 SNPs from Chromosome 1 of the GAIN data
R 2
Intercept Slope
LD=1 0.9398 0.0088 1.0064
LD[0.9-1.0) 0.7504 0.0441 0.9305
LD[0.8-0.9) 0.6832 0.1163 0.9032
LD[0.7-0.8) 0.4438 0.1938 0.8084
LD[0.6-0.7) 0.4882 0.0979 0.8310
LD[0.5-0.6) 0.7264 0.0784 1.1980
LD[0,0.5) 0.1834 0.2575 0.6206
All 0.7818 0.0637 0.9555
Both scatter plots of all 22 chromosomes and results in Table 1 for
Chromosome 1 are consistent and indicate that the p-values based on imputed
allele frequencies are biased toward underestimation. The performance seems to
be appropriate for situations when the neighboring LD values are high and is
getting worse as the LD values decreases.
LD Group
5
Performance of MiDCoP Using Different Reference Sets
As shown in previous section, the performance based on allele-based association
tests does not appear to be satisfactory for practical use except for the situations
where the pairwise LD's are at the High LD category (Figure 1(a).) The
simulation results provided by Gautam [10] regarding the imputation accuracy of
allele frequencies, however, are very high (correlation > 0.9925). This raises an
interesting question on why the performance based on association tests is
unsatisfactorily; while the performance of imputing allele frequencies are
excellent. This section attempts to investigate an important factor that leads to this
discrepancy.
In this section, we study the effect of matched and unmatched reference set to
the accuracy of p-values using the Chi-Square test from the imputed allele
frequency. We use separate reference sets for case and control with different
levels of matching the corresponding sample and reference set. First, we need to
create two different reference sets using phased individual level genotype data.
One reference set is the 'unmatched' reference set, that is, the individuals in the
reference set are not overlapping with those in the sample data that will be
imputed. The other is the 'matched' reference set. Instead of creating a 100%
Search WWH ::




Custom Search