Information Technology Reference
In-Depth Information
- PRRP [22] optimizes a progressive global alignment by iteratively dividing
the sequences into two groups which are realigned using a global group-to-
group alignment algorithm.
- HMMT [23] is based on Hidden Markov Model (HMM), using simulated
annealing (SA) to maximize the probability that a HMM represents the
sequences to be aligned.
- MUSCLE (multiple sequence comparison by log-expectation) [24] is based
on similar strategies used by PRRP.
- SAGA (Sequence Alignment by Genetic Algorithm) [25] is a genetic algo-
rithm based on COFFEE (Consistency Objective Function For alignmEnt
Evaluation) objective function [26]. The model described in SAGA has re-
ceived considerable interest in the evolutionary computation community.
- Another iterative alignment method is Praline [27]; it begins with a prepro-
cessing of the sequence to align.
In general, Evolutionary Algorithms tend to be suitable tools for the MSA
[28] and can be used to effectively search in large solution spaces. But they
spend a lot of time gradually improving potential solutions before reaching a
solution comparable to deterministic methodologies [29]. This is due to a random
initialization of the candidate alignments.
5R su s
The immune algorithm presented has been tested on the classical benchmark
BaliBASE version 1.0 and version 2.0. BAliBASE (Benchmark Alignment data-
BASE) [36] is a database developed to evaluate and compare all multiple align-
ments programs containing high quality (manually refined) multiple sequence
alignments.
BAliBASE is divided into two versions: the first version contains 141 reference
alignments and is divided into five hierarchical reference sets containing twelve
representative alignments. Moreover, for each alignment the core blocks are de-
fined. They are the regions which can be reliably aligned and they represent
58% of residues in the alignments. The remaining 42% are in ambiguous regions
which cannot be reliably aligned.
Reference 1 contains alignments of equi-distant sequences with similar length,
reference 2 contains alignments of a family (closely related sequences with > 25%
identity) and 3 ”orphan” sequences with < 20% identity, reference 3 consists of
up to four families with < 25% identity between any two sequences from differ-
ent families and references 4 and 5 contain sequences with large N/C-terminal
extensions or internal insertions. For an extensive explanation of all references
please refer to [3].
In the second version, BAliBASE v.2.0 [37], all alignments present in the first
version have been manually verified and it includes three new reference sets:
repeats, circular permutations and transmembrane proteins. It consists of 167
 
Search WWH ::




Custom Search