Information Technology Reference
In-Depth Information
shifted into the freed space. The designed CSA considers only two immunolog-
ical entities: antigens (Ags) and B cells. The Ag is the problem to solve, i.e. a
given MSA instance, and B cells are the candidate solutions, i.e. a set of align-
ments, that have solved (or approximated) the initial problem [32,33]. Tackling
the multiple sequence alignment problem Ags and B cells are represented by a
sequences matrix.
Let
Σ
=
be the al-
phabet, where each symbol represents twenty amino acids and let
S
=
{
A, R, N, D, C, E, Q, G, H, I, L, K, M, F, P, S, T, W, Y, V
}
{
S
1
,S
2
,
...,S
n
}
,
such that
S
i
∈
Σ
∗
.
Therefore, an Ag is represented by a matrix of
n
rows and
max
be the set of
n
≥
2 sequences with length
{
1
,
2
,...,
n
}
{
1
,...,
n
}
) matrix was used, with
=(
2
columns, whereas for the B cells a (
n
×
·
max
)
.
These values where taken from experimental the proposed al-
gorithm was able to develop more
compact alignments
.Inparticular,fortheB
cells a binary matrix was used, where
s
i,j
{
1
,...,
n
}
= 0 refers to a gap in the alignment
and
s
i,j
= 1 to a residue with 1
≤
i
≤
n
and 1
≤
j
≤
.
A Initialize the Population
Two different strategies were used to create the initial population (
t
=0)of
candidate alignments. The first strategy,
random initialization
, is based on the
use of random “
offsets
” to shift the initial sequences in the following way: an
offset is randomly chosen in the range [0
,
(
i
)] by a uniform distribution and
then the sequence
S
i
is shifted from an offset positions towards the right side of
the row
i,
of the current B cell.
A second way to initialize the population was analyized, seeding the initial
population with CLUSTALW and
CLUSTALW-seeding
. However, a percentage
of the population was initialized using the offsets strategy described above to
avoid the algorithm getting trapped in a local optima. Hence, the second strategy
creates a percentage of initial alignments using CLUSTALW and the remaining
alignments are determined by a random offsets creation.
Preliminary experimental results show that the proposed algorithm achieves
better performance using the second strategy. Therefore, all results shown in
this paper were obtained using a combination of the two previously introduced
strategies (80% of B cell population by CLUSTALW seeding and 20% of B cell
population by random initialization using the random offsets).
The presented hybrid IA incorporates the classical
static cloning operator
,
which clones each B cell
dup
times producing an intermediate population
P
(
clo
)
N
c
−
of
N
c
=
d
dup
B cells, where
d
is the population size).
The basic mutation processes which are considered in pairwise alignment and
multiple sequence alignments are:
substitutions
which change sequences of amino
acids, as well as
insertions
and
deletions
which add or remove amino acids and/or
gaps. In a first version of the algorithm the classical hypermutation and hyper-
macromutation operators where used: first operator flips a bit, using a number of
mutations inversely proportional to the fitness function value [34], whereas the
hypermacromutation simply swaps two randomly choosen subsequences. How-
ever, the first experiments produced non optimal alignments obtained, leading
×