Biomedical Engineering Reference
In-Depth Information
leading to improved power and/or genomic coverage. However, the merging process must
be performed with care. We outline some key steps that need attention when creating a
merged dataset for two distinct genotyped samples.
First, one must determine which SNPs have been genotyped in common in two sam-
ples, taking into account possible changes in rs (reference SNP) numbers for SNPs. To
do this accurately, the lists of genotyped SNPs may be submitted to dbSNP in 'batch'
mode (http://www.ncbi.nlm.nih.gov/projects/SNP/). This process will return information on
whether any rs numbers have been changed or consolidated to a single rs number, according
to the latest dbSNP build.
Second, in each sample, the coding of the alleles must be compared to see whether the
allele coding matches, or whether one sample is using the complementary nucleotides of the
other. This task is straightforward for SNPs other than self-complementary 'A/T' or 'C/G'
SNPs; for example, if both samples call the alleles 'C' and 'T', the allele coding matches,
whereas if one sample uses 'C' and 'T' and the other uses 'G' and 'A', alleles should
be flipped, in one sample, to the complementary nucleotides to be consistent across both
samples before merging. Aligning the allele coding is more difficult for self-complementary
'A/T' or 'C/G' SNPs, as it can be unclear whether the allele called ' A' in one sample
corresponds to the 'A' or the 'T' allele in the second sample. If information about the
'strand' of the call is available, or if flanking primer sequence is available, this can be used
to check for matching. Checking allele frequencies separately in both groups, and comparing
with frequencies recorded at dbSNP, can help to align alleles as long as the two samples
have similar population histories (e.g. both of European descent) and the allele frequencies
are far from 50%.
4.3.1.1 Annotating and displaying SNPs and results
To interpret results from a large-scale gene mapping study, and to create tables and
figures for publication, it is important to annotate the SNPs and their corresponding
statistical results with information about gene locations and other information available
from public databases such as dbSNP and HapMap. However, integrating statistical
results with consistent, up-to-date annotation information on a genome-wide scale (e.g.
for a GWAS) can be laborious. Fortunately, software tools are now available that aid the
presentation of genome-wide association data. The publicly available program WGAviewer
[92] is a user-friendly solution that annotates and provides visualization tools for GWAS
results.
For
details
on
available
functions
and
usage, please refer
to
the
website
at
http://people.genome.duke.edu/
dg48/WGAViewer/.
References
1. Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases.
Science (New York, NY) , 273 , 1516 - 1517.
2. Saxena, R., Voight, B.F., Lyssenko, V. et al . (2007) Genome-wide association analysis identifies
loci for type 2 diabetes and triglyceride levels. Science (New York, NY) , 316 , 1331 -1336.
3. Scott, L.J., Mohlke, K.L., Bonnycastle, L.L. et al . (2007) A genome-wide association study of
type 2 diabetes in Finns detects multiple susceptibility variants. Science (New York, NY) , 316 ,
1341 -1345.
Search WWH ::




Custom Search