Information Technology Reference
In-Depth Information
1. Analysis of current work
The basis for comparison of protein and gene sequences for similarity is to examine if they are
related by evolution (they have a common ancestor). However, random mutations in the
sequences with common ancestor develop over time as well as similar portions come up for
different structures and functions and this should be considered in studies. In parts of the
sequence that are critical for the function of the protein, hardly any mutations will be accepted;
nearly all changes in such regions will destroy the function [2].
One important algorithm used in sequence analysis is Dynamic Programming (DP). In
DP, large tables are built with all known previous results. The solution of the problem then
depends on the solutions of smaller ones in the table. A recursive structure for computing
optimal score in DP algorithm is designed and interdependent sub solutions are filled in the table
using the recurrence rule. The tables are created iteratively based on an optimal recurrence
function and result is computed in a bottom up fashion. The construction of this table should be
made efficiently since scanning of the table leads to quadratic running times. What if (a)
combining the solutions of smaller problems of the same kind to form the solution of a larger one
is not be possible, (b) the number of small problems to solve are unacceptably large (c) the costs
are fractional in which the efficiency of DP is limited? The reduction in search space and
employing other techniques like Top Down DP, Divide and Conquer, Greedy Approach and
Progressive Sequence Alignment, by accompanying and replacing the procedure might help in
that matter. The bottom line is that DP is applicable when the subproblems are not independent
and, the problem must be an optimisation problem.
Assumptions and inferences made are based on the evolutionary change and constitute
the context in which the alignment process takes place. An optimal alignment is the one with
maximum number of matches and minimum number of mismatches and gaps. The score of an
alignment is the sum of position scores. The gap penalty used in scoring scheme is important. It
helps deciding whether or not to accept a gap or insertion in an alignment when it is possible to
achieve a good alignment at some other neighbouring points in the sequence. One can not let
gaps and insertions occur without penalty, otherwise an unreasonable alignment with gaps would
result. Biologically, it should be natural for a protein to accept a different residue in a position,
rather than having parts of the sequence deleted or inserted. Gaps and insertions should therefore
be more rare than point mutations/substitutions [2].
In pairwise alignments, there is a two-dimensional matrix with the sequences on each
axis, and the elements in the matrix are initially the substitution coefficients, which are then
operated on to locate the best path through the matrix. The number of operations required to do
this is approximately proportional to the product of the lengths of the two sequences. Dot plot as
a graphical tool can help in aligning two sequences. Pairwise sequence alignment is basis for the
other analyses even for experimental design of PCR primer design. But, there are some problems
with pairwise alignments. For example, when many sequences that are significantly similar to
the query sequence are obtained, comparing each sequence to every other may become
impractical as the number of sequences increases. Then, multiple sequence alignment, where all
similar sequences can be compared in one single figure or table is employed. The basic idea is
that the sequences are aligned on top of each other, so that a co-ordinate system is set up, where
each row is the sequence for one protein, and each column is the same position in each sequence.
Each column corresponds to a specific residue in the prototypical protein. One may have to
introduce gaps in sequences at positions where there were no gaps in the corresponding pairwise
alignment; thus, multiple alignments typically contain more gaps than any given pair of aligned
sequences.
Search WWH ::




Custom Search