Biomedical Engineering Reference
In-Depth Information
2.2.2.7 Variant detection
Once reads are assembled on a reference sequence, variants can be detected by pairwise
comparisons between the reads and the reference sequence. If an assembly was generated
using the
phredPhrap
script (see Protocol 2.5), then
Polyphred
[19] can be used to detect
variants (see Protocol 2.6). This program, also developed at the University of Washington,
is one of the most widely used tools for mutation detection. It detects heterozygous and
homozygous SNPs as well as indels, scores each variant, and provides relevant genotypes
from the read basecalls. Both the
phredPhrap
suite and
Polyphred
, while intuitive, require
basic knowledge of Linux/UNIX. Alternatively, the NovoSNP tool [21] (Figure 2.1) can per-
form variant discovery and visualization in a more user-friendly interface (see Protocol 2.7).
While less scalable than
Polyphred
, NovoSNP runs on Windows, Mac or Linux computers
and requires only a reference sequence (FASTA file) and sequence traces (in binary format).
PROTOCOL 2.6 Variant detection in
Phrap
assemblies
with
Polyphred
Equipment and reagents
•
Directory structures,
Phred
output and assemblies for sequence traces and reference
sequence from running
phredPhrap
(see Protocol 2.
u
)
The
Polyphred
program.
v
•
Method
1Runthe
phredPhrap
script from the
edit_dir
subdirectory if you have not already done
so (see Protocol 2.5).
2Runthe
Polyphred
program
w
from the
edit_dir
subdirectory:
(a) cd edit_dir/
(b) polyphred -ace [ace_file] -refcomp [refseq_id] [options]
x
3Reviewthe
Polyphred
output.
y
Notes
u
If the
phredPhrap
script ran successfully, there should be four subdirectories:
chromat_dir,
edit_dir, phd_dir
and
poly_dir
.Thereshouldbeonefilepertraceinthe
chromat_dir, phd_dir
and
poly_dir
subdirectories. There should also be an assembly (.ace) file in the
edit_dir
folder.
v
Available from http://droog.gs.washington.edu/polyphred [19].
w
Recommended options for SNP detection include:
-t genotype: specifies output with Consed-compatible tags and SNP genotypes.
•
-quality 25: specifies the quality threshold to use bases for variant calling.
•
-score 25: specifies the score threshold for variant calling.
•