Information Technology Reference
In-Depth Information
the summary data of SNPs such as allele frequencies or the p-values of the
association tests directly. A new method namely MiDCoP method to impute the
allele frequencies of missing SNPs was developed in [10]. This article investigates
the performance of the MiDCoP method by using the case-control GWAS data on
the study of Genotype Association Information Network (GAIN) Schizophrenia in
European American Population. We first evaluate the association tests by
comparing the corresponding p-values between imputed and actual allele
frequencies of SNPs. The results appear that the MiDCoP performs adequately
only when the LD values are high. The second evaluation is to compare the
performance between 'matched' and 'unmatched' reference sets. The results
indicate that the better the reference set matches the sample data, the better the
performance of the method is.
Several questions remain unanswered in this article. First, in a practical
problem, it is not easy to find the reference that 'matches' the sample data. In most
practical situations, the imputation in both case and control is carried out with a
single reference set. In our study, we use the reference set from the HapMap
project. It is critical to identify a reference set from the population that matches
the sample data as close as possible. Besides the HapMap project, one can also
look for the reference set from the 1000 Genome project [14]. As the results
indicate that there is an underestimate bias based on the MiDCoP method. Further
research will be needed to develop a method to adjust the bias. One possible
approach is to adjust the conditional probability, P(X|A-B) from the reference set
based on the bias adjustment of the allele frequencies of the flanking SNPs in case
and control. Another further research is to compare the performance between the
MiDCoP method and the existing individual level based imputation methods such
as IMPUTE. The authors have made some progress in this research. More analysis
and comparisons will be needed using existing GWAS data.
Acknowledgement. This research is partially supported by the internal grant from Central
Michigan University. The GWAS data analyzed is the data set of GAIN Schizophrenia for
European ancestry from Database of Genotypes and Phenotypes (dbGap) (Bethesda, MD:
National Center for Biotechnology Information, National Library of Medicine. Available
from: http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap. dbGaP analysis accession:
pha0002857.v1.p1) and the HapMap III (The International HapMap Project, 2010)
reference panel of CEU population.
References
1. Marchini, J., Howie, B., Myers, S., McVean, G., Donnelly, P.: A new multipoint
method for genome-wide association studies by imputation of genotypes. Nature
Genetics 39, 906-913 (2007)
2. Howie, B., Donnelly, P., Marchini, J.: A flexible and accurate genotype imputation
method for the next generation of genome-wide association studies. PLoS Genetics 5,
e1000529 (2009)
Search WWH ::




Custom Search