Aligning Multiple Protein Sequences by Hybrid Clonal Selection Algorithm with Insert-Remove-Gaps and BlockShuffling Operators - Artificial Immune Systems

Information Technology Reference

In-Depth Information

In order to be able to align a set of bio-sequences a reliable objective function

able to measure an alignment in terms of its biological plausibility through an

analytical or computational function is needed. Alignment quality is often the

limiting factor in the analysis of biological sequences — defining an appropri-

ate and ecient objective function can remove this limitation. It is an active

research field [3]. A simple objective function to optimize is the weighted sums-

of-pairs (SP) with ane gap penalties [4], where each sequence receives a weight

proportional to the amount of independent information it contains [5] and the

cost of the multiple alignment is equal to the sum of the cost of all the weighted

pairwise substitutions.

This research paper proposes a Hybrid Clonal Selection Algorithm (CSA)

which incorporates specific perturbation operators for MSA of amino-acids se-

quences. The obtained results show that the proposed Immune Algorithm is

comparable to state-of-art algorithms.

2

The Multiple Sequence Alignment Problem

To determine if two biological sequences have common sub-sequences is the most

popular sequence analysis problem. As described in [2] there are four fundamen-

tal topics: (1.) what kinds of alignment should be considered; (2.) the scoring

function adopted to evaluate alignments; (3.) the alignment algorithm designed

to find optimal (or suboptimal) scoring alignments; (4.) the statistical meth-

ods used to assess the significance of an alignment score . This paper focuses on

the key issues of design and ecient implementation of alignment algorithms

of finding optimal and suboptimal alignments of protein structures — but the

technique is also applicable to DNA alignments.

Definition 1 [Sequence Alignment]. Let S =

be a set of n

sequences (strings) over a finite alphabet Σ, each sequence S i consisting of i

ordered characters s i,j :

{

S 1 ,S 2 ,...,S n }

S i = s i, 1 s i, 2 ...s i, i ,

∀

i =1 , 2 ,...,n

Σ a new alphabet:

Σ = Σ

Let

∪{−}

by adding the symbol dash '-' to represent

gaps.

Then a set S =

of sequences over the alphabet Σ is called a

sequence alignment of the set of sequence S, if the following properties are

fulfilled:

S 1 , S 2 ,..., S n }

{

S have the same length

ˆ with

1. All strings in

n

ˆ

i =1 ...n ( i )

max

≤

i .

i =1

S can be interpreted as n

ˆ matrix where the i

th row contains string S i .

×

−

S i is identical with sequence S i ,

2. Ignoring gaps, sequence

∀

i =1 , 2 ,...,n.

3. S has no columns that contains gaps only.

Artificial Immune Systems

Search WWH ::

Custom Search

Home