Computational Molecular Biology - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

6.6 CONCLUSIONS

Computational molecular biology is rapidly becoming a very important research area.

Owing to the amount and complexity of the organisms that are being constantly

sequenced all over the world, it is becoming unthinkable to analyze these new genomes

without the aid of computers.

One of the most fundamental problem in computational molecular biology is bio-

logical sequence comparison, which aims to determine the similarity between two

DNA (or protein) sequences. This comparison is very important because biologists

often infer the properties of newly sequenced genomes by comparing them against

genomes which are already determined.

Formally speaking, this is a problem of approximate pattern matching where the

goal is to determine which sequences are similar to a query sequence. Algorithms

such as the ones proposed by NW [21] and SW [28] are exact algorithms which

use dynamic programming techniques to solve the sequence comparison problem.

However, the time and space complexities of these algorithms is O(n 2 ) , where n is

the size of the sequences. Therefore, they are often considered impracticable when

huge sequences are compared against huge databases.

Hirschberg [13] proposed an exact algorithm that can be adapted to solve the

sequence alignment problem, which requires O(n) space, at the expense of doubling

the execution time. Other exact methods were proposed but, as far as we know, there

is still no exact algorithm that runs on less asymptotic time than NW.

For this reason, heuristic methods such as FASTA and BLAST were proposed.

Although those methods are much faster than the exact algorithms, their sensibility

is often reduced and a great effort must be done to find appropriate parameter tuning.

An obvious way to accelerate these algorithms is by running them in multiple

processors or multiple nodes. Many techniques were proposed to parallelize the SW

algorithm, and good results were indeed obtained. As the time complexity of BLAST

is O(n) , the common approach is to distribute the database's sequences among the

nodes to reduce the execution time of the whole process. In this sense, many algorithms

were proposed and good reductions on execution time were achieved. However, in

real genome projects, SW is rarely used as its execution time, even in multiple proces-

sors, is still very high. Thus, biologists often use heuristic methods such as BLAST,

sacrificing sensitiveness for the sake of performance.

To conclude, biological sequence comparison is a very challenging problem. This

is mainly due to the fact that we must find approximate pattern matching between a

query sequence and a very huge database. There is a clear need for new algorithms

that could obtain results with high sensitivity in a reasonable time. In addition, new

ways to accelerate the known algorithms must be found. In this direction, parallel

and distributed processing seems to be a very good approach. Biological sequence

comparison is, thus, still an open problem and very intensive research activity is done

in this area.

Search WWH ::

Custom Search

Home