Information Technology Reference
In-Depth Information
In order to be able to align a set of bio-sequences a reliable objective function
able to measure an alignment in terms of its biological plausibility through an
analytical or computational function is needed. Alignment quality is often the
limiting factor in the analysis of biological sequences — defining an appropri-
ate and ecient objective function can remove this limitation. It is an active
research field [3]. A simple objective function to optimize is the weighted sums-
of-pairs (SP) with ane gap penalties [4], where each sequence receives a weight
proportional to the amount of independent information it contains [5] and the
cost of the multiple alignment is equal to the sum of the cost of all the weighted
pairwise substitutions.
This research paper proposes a Hybrid Clonal Selection Algorithm (CSA)
which incorporates specific perturbation operators for MSA of amino-acids se-
quences. The obtained results show that the proposed Immune Algorithm is
comparable to state-of-art algorithms.
2
The Multiple Sequence Alignment Problem
To determine if two biological sequences have common sub-sequences is the most
popular sequence analysis problem. As described in [2] there are four fundamen-
tal topics: (1.) what kinds of alignment should be considered; (2.) the scoring
function adopted to evaluate alignments; (3.) the alignment algorithm designed
to find optimal (or suboptimal) scoring alignments; (4.) the statistical meth-
ods used to assess the significance of an alignment score . This paper focuses on
the key issues of design and ecient implementation of alignment algorithms
of finding optimal and suboptimal alignments of protein structures — but the
technique is also applicable to DNA alignments.
Definition 1 [Sequence Alignment]. Let S =
be a set of n
sequences (strings) over a finite alphabet Σ, each sequence S i consisting of i
ordered characters s i,j :
{
S 1 ,S 2 ,...,S n }
S i = s i, 1 s i, 2 ...s i, i ,
i =1 , 2 ,...,n
Σ a new alphabet:
Σ = Σ
Let
∪{−}
by adding the symbol dash '-' to represent
gaps.
Then a set S =
of sequences over the alphabet Σ is called a
sequence alignment of the set of sequence S, if the following properties are
fulfilled:
S 1 , S 2 ,..., S n }
{
S have the same length
ˆ with
1. All strings in
n
ˆ
i =1 ...n ( i )
max
i .
i =1
S can be interpreted as n
ˆ matrix where the i
th row contains string S i .
×
S i is identical with sequence S i ,
2. Ignoring gaps, sequence
i =1 , 2 ,...,n.
3. S has no columns that contains gaps only.
 
Search WWH ::




Custom Search