Information Technology Reference
In-Depth Information
1
Introduction
A genome wide association study aims at investigating the association between
genetic variations and particular diseases by analyzing the association based on
Single Nucleotide Polymorphisms (SNPs) between individuals with the specific
disease of interest (case group) and individuals without the disease (control
group). The association statistics and the corresponding p-values between case and
control based on SNPs are used to determine if these SNPs are statistically
significant in distinguishing the disease from no-disease. These types of studies
require genotyping DNA sequence of individuals in both case and control
samples. The current technologies of genotyping individual DNA sequence only
genotype a limited DNA sequence of each individual in the study. As a result, a
large fraction of SNPs are not genotyped. Various imputation methods based on
individual level data have been developed and successfully implemented to impute
the missing SNPs. These approaches can broadly fit into two categories. One is
the Hidden Markov Model (HMM) based approach. Some methods in this
category are: IMPUTE [1,2], MACH [3,4], BEAGLE [5], and fastPhase/
BIMBAM [6]. The other is tag SNP based approach. Some methods in this
category are: TUNA [7], WHAP [8], and SNPMstat [9]. Since these methods are
based on individual level DNA sequence, they are time consuming and costly.
Furthermore, the need of individual level data in the imputation process results in
the exclusion of the studies with only summary data in imputation-based meta-
analysis. Thus, it is desirable to develop methods for imputing the untyped SNPs
from the summary level data.
The purpose of imputation is to increase the sample size and the coverage of
SNPs, which in turn increases the power of the association test. However,
imputation at individual level is time consuming and costly. Alternative
approaches could be to impute the allele frequencies or p-values directly. By
imputing the p-values directly would be the most efficient with the least cost.
However, the p-value is associated with the association test. One can apply
different association tests while each test may result in different p-values.
Imputing p-values directly will require a pre-determined association test.
Therefore, imputing allele frequencies seems to be a better alternative in terms of
efficiency, cost and allowing for applying different association tests to identify the
significant SNPs.
Gautam [10] developed a new method, namely the Minimum Deviation of
Conditional Probability (MiDCoP), which aims at imputing the allele frequencies
of the missing SNPs using the allele frequencies of neighboring SNPs without
using the individual level SNP information. The advantages of this new method
include (1) it does not require individual level genotype data. Thus it is much
more computationally efficient, and (2) it can be applied to the studies where only
the summary level data such as allele frequencies of SNPs are available. However,
it is essential that the method has to perform properly in order to be a viable
approach for practical use. Gautam [10] performed various evaluations regarding
Search WWH ::




Custom Search