Biomedical Engineering Reference
In-Depth Information
6.6 CONCLUSIONS
Computational molecular biology is rapidly becoming a very important research area.
Owing to the amount and complexity of the organisms that are being constantly
sequenced all over the world, it is becoming unthinkable to analyze these new genomes
without the aid of computers.
One of the most fundamental problem in computational molecular biology is bio-
logical sequence comparison, which aims to determine the similarity between two
DNA (or protein) sequences. This comparison is very important because biologists
often infer the properties of newly sequenced genomes by comparing them against
genomes which are already determined.
Formally speaking, this is a problem of approximate pattern matching where the
goal is to determine which sequences are similar to a query sequence. Algorithms
such as the ones proposed by NW [21] and SW [28] are exact algorithms which
use dynamic programming techniques to solve the sequence comparison problem.
However, the time and space complexities of these algorithms is O(n 2 ) , where n is
the size of the sequences. Therefore, they are often considered impracticable when
huge sequences are compared against huge databases.
Hirschberg [13] proposed an exact algorithm that can be adapted to solve the
sequence alignment problem, which requires O(n) space, at the expense of doubling
the execution time. Other exact methods were proposed but, as far as we know, there
is still no exact algorithm that runs on less asymptotic time than NW.
For this reason, heuristic methods such as FASTA and BLAST were proposed.
Although those methods are much faster than the exact algorithms, their sensibility
is often reduced and a great effort must be done to find appropriate parameter tuning.
An obvious way to accelerate these algorithms is by running them in multiple
processors or multiple nodes. Many techniques were proposed to parallelize the SW
algorithm, and good results were indeed obtained. As the time complexity of BLAST
is O(n) , the common approach is to distribute the database's sequences among the
nodes to reduce the execution time of the whole process. In this sense, many algorithms
were proposed and good reductions on execution time were achieved. However, in
real genome projects, SW is rarely used as its execution time, even in multiple proces-
sors, is still very high. Thus, biologists often use heuristic methods such as BLAST,
sacrificing sensitiveness for the sake of performance.
To conclude, biological sequence comparison is a very challenging problem. This
is mainly due to the fact that we must find approximate pattern matching between a
query sequence and a very huge database. There is a clear need for new algorithms
that could obtain results with high sensitivity in a reasonable time. In addition, new
ways to accelerate the known algorithms must be found. In this direction, parallel
and distributed processing seems to be a very good approach. Biological sequence
comparison is, thus, still an open problem and very intensive research activity is done
in this area.
Search WWH ::




Custom Search