Biology Reference
In-Depth Information
FASTA programs can read sequences from MySQL and Post-
greSQL databases. FASTA can also read library subsets; format 10
libraries work like BLASTP -gilist searches, but also allow a
more general strategy for identifying sequence subset identifiers.
Protein databases can use lower-case residues to indicate low-
complexity residues, which are ignored when the -S option is
used. While makeblastdb can be used to produce FASTA format
12 databases, it is rarely necessary. FASTA can search most widely
used sequence formats.
Command line differences —BLAST and FASTA both offer a diverse
set of command line options that modify the behavior and output
of the programs. Both BLAST and FASTA provide a list of popular
options with the -h option, and a more comprehensive list with the
-help option, e.g., blastp -help . Popular BLAST options are
outlined in Table 5 ; popular FASTA options are listed in Table 6 .
The BLAST programs use command line options ( -db , -query ,
-matrix ) to provide all the information the program needs,
including the name of the query file, database, scoring matrix, etc.
FASTA uses command line options ( -s matrix , -f gap-open ,
-g gap-extend ) to modify default search parameters but expects
the query.file and library.file to be specified after all
program options. Thus, the scoring matrix -s BP62 option below:
fastx -s BP62 query.file library.file
must precede the query.file and library.file arguments.
Searching—Web-based or local? —Widely used searching programs
like BLAST and FASTA can be run either through web interfaces,
on local computers, or in cloud computing environments like
Amazon Web Services. The BLAST programs were developed at
the NCBI and are tightly integrated into the NCBI's web site
( blast.ncbi.nlm.nih.gov/Blast.cgi ) . All the programs in the
FASTA package are available at the European Bioinformatics Insti-
tute (EMBL-EBI) web site ( www.ebi.ac.uk/Tools/sss ); the
EMBL-EBI also provides the BLAST programs. Similarity search-
ing on the web is convenient; investigators can be confident that
they are using a current version of the search program to search
comprehensive and up-to-date databases. Interactive web access is
often the quickest way to build a comprehensive set of sequences
from a protein family for Multiple Sequence Alignment. For more
time-consuming analyses (e.g., characterization of the thousands of
sequences from a finished microbial genome), both the NCBI and
EMBL-EBI offer programmatic access to their web sites, so that a
computer script or program can launch large numbers of searches
and collect the results.
For large-scale analyses, for example from millions of meta-
genomics sequence reads, the similarity searching programs will
typically be run on local computers or a local computer cluster, or
2.3 Where to
Search?
Search WWH ::




Custom Search