Biomedical Engineering Reference
In-Depth Information
a database hosted by the National Center for Biotechnology Information (NCBI) in the
USA. As of build 129 (April, 2008), dbSNP contained more than 1000 unique variants for
at least 20 species (see Table 2.1).
Organism-specific databases, such as WormBase [8] and FlyBase [9], also contain exten-
sive collections of known sequence variants. Extensive information about SNPs in the human
genome, including their allele frequencies in different populations, can also be accessed
through the University of California Santa Cruz (UCSC) Genome Browser [10, 11] and the
website of the International HapMap Project [12, 13].
2.2.2 Targeted resequencing for variant discovery
The current gold standard for SNP discovery is direct resequencing of genomic DNA. In
this approach, regions of interest are amplified by the polymerase chain reaction (PCR) and
directly sequenced with automated DNA sequence analyzers. Traditional sequencing plat-
forms, such as the ABI 3730XL, typically generate sequence 'reads' of 400 - 600 high-quality
bases. Variants are identified by comparing these reads with the corresponding reference
sequence.
2.2.2.1 Sample selection
Several factors should be considered when selecting samples to include for targeted rese-
quencing. First of all, samples should be prioritized by the likelihood that they contain
relevant sequence variants in the regions of interest. One approach is the 'sequencing the
extremes' method, in which samples at each end of a phenotypic spectrum are chosen
for variant discovery. This approach is particularly well suited for studies of quantitative
phenotypes; for example, blood pressure, drug dosage, body mass index, and so on. The
reasoning is that patients at one end of the spectrum are likely to have variants that confer
susceptibility, whereas patients at the other end of the spectrum are likely to have variants
that confer resistance.
Another important consideration during sample selection is the quality and available
quantity of DNA. Sample quality and purity are especially critical, since contamination (by
other tissue types, other organisms, etc.) can lead to false positives due to noisy sequence
data. The quantity of available DNA from a particular sample (or patient) should also be
considered. Whenever possible, the samples chosen for resequencing should have enough
DNA kept in reserve for validation or further experimentation.
The number of samples to include for resequencing depends on several factors, but is
typically dictated by the number of samples available and the budget for the project. While
DNA resequencing can be costly, the more samples screened for sequence variants, the
greater is the probability that key variants will be found.
2.2.2.2 Selecting regions of interest
The goal of targeted resequencing for mutation discovery is to choose genomic regions
most likely to harbor variants of interest. There are two principal approaches to candidate
gene selection. The first is knowledge based, in which the published literature and current
understanding of the biology underlying a particular phenotype are used to construct lists
of relevant genes. The second approach relies on experimental results (e.g. gene expression
Search WWH ::




Custom Search