Biomedical Engineering Reference
In-Depth Information
reference sequence of the human genome. It implements two
algorithms, bwa-short and BWA-SW depending on the length of the
query sequence. The former works for lengths shorter than 200 bp and
the latter for longer sequences up to around 100 kbp. Both algorithms
do gapped alignment. They are usually more accurate and faster on
queries with low error rates;
SAMtools [20] - SAM (Sequence Alignment/Map) tools are a set of
utilities that can manipulate alignments from fi les in the BAM format,
which is the format of the raw data output fi le of the sequencers.
SAMtools exports to the SAM (Sequence Alignment/Map) format,
performs sorting, merging and indexing of the sequences and allows
rapid retrieval of reads from any region;
Picard [21] - Picard ( ) provides Java-
based command-line utilities that allow manipulation of SAM and
BAM format fi les. Furthermore there is a Java API (SAM-JDK) for
developing new applications able to read in as well as write SAM fi les
VCFTools [22] - the Variant Call Format (VCF, http://vcftools. ) is a specifi cation for storing gene sequence variations
and VCFtools is a package designed for working with these VCF
format fi les, for example those generated by the 1000 Genomes
Project. VCFtools provides a means for validating, merging, comparing
and calculating some basic population genetic statistics;
BGZip [23] - BGZip is a data compression utility that uses the
Burrows-Wheeler transform and other techniques to compress
archives, sounds and videos with high compression rates;
Tabix [24] - Tabix is a tool that indexes position sorted fi les in tab-
delimited formats such as SAM. It allows fast retrieval of features
overlapping specifi ed regions. It is particularly useful for manually
examining local genomic features on the command-line and enables
genome viewers to support huge data fi les and remote custom tracks
over networks;
Variant Effect Predictor [25] - this is a utility from Ensembl that
provides the facility to predict the functional consequences of variants.
Variants can be output as described by Ensembl, NCBI or the Sequence
Genome Analysis ToolKit (GATK) [26] - the GATK is a structured
software library from the Broad Institute that makes writing effi cient
analysis tools using next-generation sequencing data very easy. It is
also a suite of tools to facilitate working with human medical
￿ ￿ ￿ ￿ ￿
Search WWH ::

Custom Search