Aligning Multiple Protein Sequences by Hybrid Clonal Selection Algorithm with Insert-Remove-Gaps and BlockShuffling Operators - Artificial Immune Systems

Information Technology Reference

In-Depth Information

- PRRP [22] optimizes a progressive global alignment by iteratively dividing

the sequences into two groups which are realigned using a global group-to-

group alignment algorithm.

- HMMT [23] is based on Hidden Markov Model (HMM), using simulated

annealing (SA) to maximize the probability that a HMM represents the

sequences to be aligned.

- MUSCLE (multiple sequence comparison by log-expectation) [24] is based

on similar strategies used by PRRP.

- SAGA (Sequence Alignment by Genetic Algorithm) [25] is a genetic algo-

rithm based on COFFEE (Consistency Objective Function For alignmEnt

Evaluation) objective function [26]. The model described in SAGA has re-

ceived considerable interest in the evolutionary computation community.

- Another iterative alignment method is Praline [27]; it begins with a prepro-

cessing of the sequence to align.

In general, Evolutionary Algorithms tend to be suitable tools for the MSA

[28] and can be used to effectively search in large solution spaces. But they

spend a lot of time gradually improving potential solutions before reaching a

solution comparable to deterministic methodologies [29]. This is due to a random

initialization of the candidate alignments.

5R su s

The immune algorithm presented has been tested on the classical benchmark

BaliBASE version 1.0 and version 2.0. BAliBASE (Benchmark Alignment data-

BASE) [36] is a database developed to evaluate and compare all multiple align-

ments programs containing high quality (manually refined) multiple sequence

alignments.

BAliBASE is divided into two versions: the first version contains 141 reference

alignments and is divided into five hierarchical reference sets containing twelve

representative alignments. Moreover, for each alignment the core blocks are de-

fined. They are the regions which can be reliably aligned and they represent

58% of residues in the alignments. The remaining 42% are in ambiguous regions

which cannot be reliably aligned.

Reference 1 contains alignments of equi-distant sequences with similar length,

reference 2 contains alignments of a family (closely related sequences with > 25%

identity) and 3 ”orphan” sequences with < 20% identity, reference 3 consists of

up to four families with < 25% identity between any two sequences from differ-

ent families and references 4 and 5 contain sequences with large N/C-terminal

extensions or internal insertions. For an extensive explanation of all references

please refer to [3].

In the second version, BAliBASE v.2.0 [37], all alignments present in the first

version have been manually verified and it includes three new reference sets:

repeats, circular permutations and transmembrane proteins. It consists of 167

Search WWH ::

Custom Search

Home