Biomedical Engineering Reference
In-Depth Information
Method
Study and Experiment Design
1 If a large-scale, genome-wide study is planned, c
carefully select the most appropriate
genotyping platform.
2 If a modest-sized (e.g. candidate gene) study is planned, d select SNPs that
cover the regions of interest (e.g. by tagging all common SNPs at a chosen LD
threshold).
3 Include duplicate samples e and individuals with known genotype (e.g. Centre de
Polymorphisme Humaine, CEPH controls) on genotyping plates.
4 Include both case samples and control samples (randomly selected and distributed
among the wells) in each genotyping plate to prevent spurious association signals
driven by potential plate effects (e.g. if differences in plate handling result in
systematic biases in genotype calls on different plates).
Quality Control (Cleaning) for Genotype Data f
5 Calculate genotyping call rates (percent of called (non-missing) genotypes) per sample
and per SNP. (In PLINK, use the '--missing' option.)
6 Among all samples, check for duplicated samples (samples that share a very high
proportion of identical genotypes), g without referring to sample status or the labeling
of planned duplicates. Confirm that planned duplicates do have matching genotypes.
Keep only one member of any duplicate pair (or multiple). h (In PLINK, use the
'-- genome' option.)
7 Among all samples, check for (unexpected) relatives (samples that share a proportion
of shared alleles consistent with a specific family relationship). (In PLINK, use the
'-- genome' option.)
8 If X chromosome genotypes are available, check all samples to ensure that the reported
sex matches the X chromosome genotype calls (e.g. no heterozygote X chromosome
genotypes for females). i
(In PLINK, use the '--check-sex' option.)
9 Check for effects of genotyping plate/batch on SNP call rates and allele
frequencies. j
10 After removing the problem data/samples in steps 2-5, again calculate call rates
(percentage of called (non-missing) genotypes) per sample and per SNP. (In PLINK,
use the '--missing' option.)
11 Filter samples, then SNPs, to keep only those with call rates above an acceptable
threshold.
12 Check self-reported race using a program such as EIGENSTRAT or STRUCTURE. k
13 Compute HWE (by race/ethnicity for studies of more than one racial/ethnic group) for
each SNP. l (In PLINK, use the '--hardy' option.)
14 Compute allele frequencies by race; compare with known allele frequencies from dbSNP
when available.
Search WWH ::




Custom Search