Information Technology Reference
In-Depth Information
the estimating accuracy of missing SNPs, and concluded that the correlation
between the actual and imputed allele frequencies is higher than .9925 in the
study. In this article, we will perform further evaluation of the method by
evaluating the performance of the association tests using the case-control data of
Genotype Association Information Network (GAIN) Schizophrenia study in
European American Population. Section 2 gives a brief description of the MiDCoP
method. Section 3 briefly describes the GAIN data and the general algorithm
implemented for our evaluation. Section 4 summarizes the performances of the
association statistics based on imputed and actual allele frequencies of SNPs using
the GAIN data. Section 5 compares the association test results using different
reference data sets. Section 6 gives a brief summary and conclusion.
2
The MiDCoP Method
The idea behind the Mimimum Deviation of Conditional Probability method
(MiDCoP) is to impute the allele frequencies of untyped SNPs in the study sample
by utilizing the allele frequencies of neighboring SNPs and haplotype frequencies
from an external reference set such as the HapMap reference set (The International
HapMap Project [11]). The best pair of the neighboring SNPs is determined by
maximizing certain multilocus information score (MIS). Gautam [10] proposed
five different MISs. In this article, we will adopt the best MIS recommended in
[10], namely, the Mutual Information Ratio (MIR, [12]). The algorithm of the
(MiDCoP) derived by Gautam [10] consists of the following three steps:
1)
SNPs Selection: Identify a set of flanking SNPs in the neighborhood of the
untyped SNP X that maximize MIR based on reference set. Let L = {L 1 , L 2 ,
…, L u } be the sequence of SNPs common to both reference set and sample
set in the neighborhood of X, and are in linkage disequilibrium with X. Our
goal is to obtain a pair {A, B}
L such that the obtained MIR between
{A, B} and {A, X, B} in the reference set is maximized for the fixed SNP
X. Here, the order of SNPs {A, B} does not need to be in the sequential
order based on their base pair position.
2) Haplotype Frequency Estimation: Once the optimal pair {A,B} is
determined from step 1, this step estimates the haplotype frequency for the
pair {A, B} in the sample.
3)
Allele Frequency Estimation: The allele frequency of untyped SNP X in the
sample is estimated as the weighted sum of the haplotype frequency
estimated in the step 2.
The Mutual Information Ratio (MIR) is defined in the following. Let S = {S 1 ,
S 2 , …, S n } and T = {T 1 , T 2 , …, T m } be two disjoint sets of n and m (bi-allelic)
SNPs with the population haplotype frequencies given by the vectors
(
)
and
, respectively. The unknown parameters
ϕϕϕ
=
,
,
,
ϕ
(
)
θθθ
=
1 ,, , t
θ
12
s
Search WWH ::




Custom Search