Information Technology Reference
In-Depth Information
4.1 Problems
We have selected two permutation problems that have been widely discussed in
the literature, exhibiting multiple applications in both academic and industrial
fields [7,37].
First, the TSP can be considered one of the most popular permutation combi-
natorial problems in the literature [15]. TSP is defined as a permutation problem
with the objective of finding, given a list of cities and the distances between each
pair of cities, the path of the shortest length (or the minimum cost) that a sales-
man has to take by visiting all the cities exactly once and returning to the
starting point. The fitness of a TSP solution is calculated as usual (i.e., as the
sum of edges' weights in the solution tour).
In another sense, DNA-FAP is one of the fundamental problems in computa-
tional molecular biology. This problem involves the combination of the partial
information from known fragments to find a consistent total DNA chain. Hence,
large DNA strands need to be broken into small fragments for sequencing in a
process called shotgun sequencing [34]. But this process does not keep either
the ordering of the fragments or the portion a particular fragment came from.
This leads to the DNA fragment assembly problem [17] where these short se-
quences have to be then reassembled in order, by using the overlapping portions
as landmarks. Most fragment assembly algorithms consist of the following steps:
-Overlap : Finding potentially overlapping reads
-Layout : Finding the order of reads along DNA
- Consensus : Deriving the DNA sequence from the layout
The overlap problem consists in finding the best match between the sux of
one read and the prefix of another one. The common practice is to filter out
pairs of fragments that do not share a significantly long common substring.
Constructing the layout is the hardest step in fragment assembly [28]. The
diculty is encountered when deciding whether two fragments really overlap
(i.e., their differences are caused by sequencing errors) or they actually come
from two different copies of a repeat. Repeats represent a major challenge for
whole genome shotgun sequencing and make the layout problem very di cult.
The final consensus step of fragment assembly amounts to correcting errors
in sequence reads. To measure the quality of a consensus, we can look at the
coverage distribution. Coverage at a base position is defined as the number of
fragments at that position. It is a measure of the redundancy of the fragment
data, and it denotes the number of fragments, on average, where a given nu-
cleotide in the target DNA is expected to appear. It is computed as the number
of bases read from fragments over the target DNA's length [17]. For a firefly
i =[0 , ..., a, a +1 , ..., n ] the Equation 3 shows the fitness (to maximize) of the
sequence i for DNA-FAP.
n
1
Fitness ( i )=
w a,a +1
(3)
a =0
where w a,a +1 is the pairwise overlap strength of fragments a and a +1[31].
 
Search WWH ::




Custom Search