Biomedical Engineering Reference
In-Depth Information
and cost efficiency of several commercial arrays by calculating the fraction of common
(minor allele frequency
0.05) SNPs that are tagged by SNPs on the chip with an r 2
surpassing a range of thresholds [28, 29]. A GWAS design may also benefit from taking into
account coverage of specific groups of genes for which there are strong prior hypotheses for
involvement with the disease of study [30, 31]. Investigators using a commercial genotyping
array may then wish to genotype supplementary SNPs to ensure good coverage of important
target genes, appropriate to the specific disease or class of diseases. For example, recent
studies to assess coverage and develop resources to help supplement current genotyping
arrays have been carried out for cardiovascular [31] and addiction [30] diseases.
In addition to selecting markers which are to be tested for disease association,
case - control studies should genotype markers that may be used to investigate the sample
for potential population stratification, which can lead to false positive association between
markers and disease. This issue is discussed further in the next section.
Data quality control
Before analyzing the genotype data for association with disease status, it is important to
examine the genotype data for possible problems and either resolve discrepancies or remove
problem observations.
For unrelated cases and controls, it is difficult to test for genotyping errors because
there is no family data to permit detection of deviation from Mendelian inheritance pat-
terns. Nevertheless, there are some useful tools to help clean the genotype data. In addition,
depending on the genotyping platform used, there may be additional quality control mea-
sures specific to that technology that the investigator should take advantage of; discussion
of these platform-specific issues is beyond the scope of this chapter, and interested inves-
tigators can follow up with individual companies or manufacturers for more details. Such
platform-dependent considerations can include techniques for assessing the genotyping clus-
ters formed by the data when making genotype calls; often these assessments are made by
experienced laboratory personnel in conjunction with statistical tests for deviation from
Hardy - Weinberg equilibrium (HWE) (discussed in more detail below). In addition, as an
example, Illumina can provide information about the likely performance of particular SNPs
on their platforms prior to genotyping, which can be important in helping investigators
select SNPs at the design stage.
One oft-used quality control technique is to check each genotyped locus for consistency
with HWE. Deviations from HWE may be attributable to several different reasons, including
inbreeding, selection or even association with the disease under study. However, when HWE
is severely violated in the control sample, it is common for such loci to be viewed with
suspicion and removed from further analysis. To test for significant deviation from HWE,
one may perform a chi-square test, or preferably Fisher's exact test when expected genotype
counts are low, using software such as PEDSTATS [32] or PLINK [33]. For a genome-wide
study covering hundreds of thousands of SNPs, however, it is expected that several SNPs
will have significant HWE p -values simply because of the number of SNPs assayed, so
the HWE information is sometimes used to flag potential problem SNPs for re-examination
of the raw genotype cluster plots, without necessarily removing them from analysis. After
association testing, however, it is worthwhile to check top associated SNPs for consistency
with HWE.
For each genotyped marker, the call rate (that is, the rate of successfully called genotypes
among all genotypes attempted) should be examined. Low call rates may be indicative of
Search WWH ::




Custom Search