Information Technology Reference
In-Depth Information
and ˆ
through the reference sets in our study. The
Shannon entropies of the discrete random variables S and T are given,
respectively, by
and
, are estimated by ˆ
ϕ
θ
ϕ
θ
e
=−
ϕϕ
log
and T
e
=−
θθ
log
.
S
j
j
i
i
j
=
1
i
=
1
If S e is the entropy of the joint distribution, the mutual information between S
and T is defined as
ee . The
MIR is defined as the normalized mutual information [12] and is given a
MI(S,T)
=+−
eee
which is bounded by
min(
,
)
ST T
ST
ee . The MIR measure can be considered as the
shared information between the two sets of haplotypes. It is symmetric to both S
and T, i.e., MIR(S,T) = MIR(T,S). For a missing SNP X, the MIR is computed
between the set S={A,B} and the missing SNP {X}. The flanking SNPs selection
step is to choose the pair {A, B} which estimates the haplotype distribution for the
set {A, X, B} with minimal loss of information.
MIR(S,T)=MI(S,T) / min(
,
)
ST
3
Data Source
In order to evaluate and compare the performance of the MiDCoP method under
different scenarios, we analyze the case-control GWAS data on the study of
Genotype Association Information Network (GAIN) Schizophrenia in European
American Population from the Database of Genotype and phenotype (dbGap)
(from: http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap, [13]). This data set
(dbGap analysis accession: phs000021.v3.p2) consists of 1,351 Schizophrenia
cases and 1,378 controls of European American Population genotyped by using
the Affy 6.0 [13]. The HapMap III (The International HapMap Consortium, [11])
phase-known data for CEU population (U.S. residents of northern and western
European ancestry (CEU)) are used as the reference set. The reference set has 234
counts of haplotypes.
4
Overall Performance of MiDCoP Method Using the GAIN
Data Set
Our purpose is to investigate the accuracy of the association test when the allele
frequencies in case and control are imputed using the MiDCoP method. The
allele-based tests [14] are applied for this purpose. For comparison, we assume
that each SNP is missing, estimate the allele frequency of the 'missing' SNP, and
compute the allele-based association test using the actual and imputed allele
frequency for each SNP. The test statistics and the corresponding p-values are
compared. The evaluation is carried out on several regions of genomes using the
GAIN Schizophrenia GWAS data. The procedure of the evaluation is described in
the following.
Search WWH ::




Custom Search