Biomedical Engineering Reference
In-Depth Information
Application
Examples
Sequence Search
BLAST, BLASTN, CLUSTALW, FASTA, MOTIF, PBLAST,
TBLASTIN
Submission
AceDB, Audit, BankIt, Sakura, Sequin, WebIN
Information Retrieval
Entrez, DBGET, IDEAS
Linkage
LocusLink
Portal
KEGG
Structure Match
CD, DALI, SCOP, Searchlite, Structure Explorer, VAST
Visualization
CAD, Cn3D, Mage, RasMol/WebMol, SWISS-PDBViewer, VRML,
WebMol
Protein-Protein Interactions
BRITE
Microarray Gene Expression Profiles Expression
Open-Reading Frame Locator
ORF Finder
Continuing with the example of research on aggression, the data warehouse might contain a
compilation of data on the fruit fly's genome, with a particular focus on the sequence that relates to
genes responsible for serotonin production. Researchers might want to compare sequences in the
fruit fly's genome with those in the human genome suspected of contributing to serotonin
neurotransmitter control, using an application such as the BLAST sequence alignment tool. One
consideration in using one of the online applications is data format.
The most popular data formats in bioinformatics include FASTA, PHYLIP, MAML (Microarray Markup
Language), NEXUS, PAUP, FASTA+GAP, and MmCIF. Some formats are specific to particular data
types and applications. For example, MmCIF is used to describe 3D structures, whereas FASTA is
used to describe sequence data. As shown in Figure 2-7 , the FASTA format begins with a single-line
description, followed by lines of sequence data. The description line is distinguished from the
sequence data by a greater-than (>) symbol in the first column. Sequences, which should be shorter
than 80 characters in length, are represented in the standard International Union of Biochemistry-
International Union of Pure and Applied Chemistry (IUB/IUPAC) amino acid and nucleic acid codes.
Exceptions are that lower-case letters are accepted and are mapped into upper-case; a single hyphen
or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and
* are acceptable letters.
Figure 2-7. The FASTA Format. This is a standard data format for use with
online sequencing databases.
Search WWH ::




Custom Search