Information Technology Reference
In-Depth Information
1) Randomly select a region of 3,000 SNPs from each of the 22 chromosomes
of the GAIN data and drop the SNPs with minor allele frequency (MAF) ≤
0.05 in the reference set.
2) Impute the allele frequency for each SNP using the MiDCoP method in both
case and control data by assuming it is missing. The MIR multilocus
information measure is used to select the optimal pair of SNPs for imputing
the missing SNP.
3) Compute the association test statistic and the corresponding p-value between
case and control for each SNP based on the imputed allele frequencies. Note
that the p-values are transformed to the negative logarithm (base 10).
4) Repeat Step 3 for each SNP based on the actual allele frequencies between
case and control.
5) A simple linear regression is computed to fit the p-values from imputed
allele frequencies (Y) with the p-values from the actual allele frequencies
(X). If the imputation is perfect, then, the intercept is zero, slope is one and
the coefficient of determination (R 2 ) is one.
Among the 22 chromosomes studied, the R 2 ranges from 0.62 to 0.79 (or in
terms of correlation, 0.787 to 0.889). The results are further summarized based on
the categorized levels of pairwise linkage disequilibrium (LD) of SNPs. SNPs
with maximum pairwise LD ≥ 0.75 is classified as High LD SNPs. Similarly,
other labels are defined as: Moderate, Low, and Weak LD if the maximum
pairwise LD is in the range of [0.5, 0.75), [0.25, 0.5), and [0, 0.25), respectively.
Figure 1 displays the scatter plots of -log 10 (p-values) between imputed (Y) and
actual (X) SNPs for all 22 chromosomes.
Fig. 1 Scatter plots of -log 10 (p-values) of all 22 chromosomes between imputed (Y) and
actual (X) SNPs
Search WWH ::




Custom Search