Information Technology Reference
In-Depth Information
and editing the attributes given above over the colour coded alignments are revisited and the
values are inserted into the work where necessary. In some cases, even when one has a lot of
information about the proteins, such as active site residues, secondary structure, 3D structure,
mutations, etc, it may still be necessary to make a manual alignment to fit all the data. [2].
There is exponential growth in known sequences, sequence and structure alignments. The
analysis data of those studies should be geared to the needs of bioinformaticians. For example,
the outcome of the decision whether it is similar or homologous affects the whole process. It
must again be considered that certain regions (structure and function) are of more crucial
residues. When two protein sequences have more than 25 % identical residues aligned, the
corresponding 3D structures are said to be very similar implying similar functionality. Therefore,
the sequence alignment of proteins remains to be an approximate predictor of the underlying 3D
structural alignment. However, experimental findings for evolutionary background should
consolidate these studies [3].
The operations like match, mismatch, insertion, deletion and introduction of gaps with varying
numbers, definitions even with different scoring subschemes can be utilised in scoring schemes.
Depending on the context, some changes are more plausible than others and probabilistic
interpretation of how likely one alignment versus another is performed. The success depends not
only on the parameters such as insertion and deletion penalties, substitution coefficients but also
on the order in which sequences are added to the multiple alignment process. A number of rules
are used to increase the success rate of the procedure like each sequence is weighted according to
how different it is from the other sequences. Of many different possible scoring schemes, one
can employ position-specific scores. For example, if one knows from other sources like its 3D
structure that a gap should not be allowed in a certain part of a sequence, then higher gap penalty
values could be determined in relevant calculation.
In overall calculation, the employment of local and global alignments or combination of
them where better fits should be considered. Local alignments in which the regions with high
degree of similarity in two sequences rather than globally aligning them from head to toe may be
preferred and done to support the global alignment. Sort and search techniques may be borrowed
in running alignment procedure based on the contextual information. A Context Sensitive
grammar may be formed to model the contextual information within the enacted environment of
the related process. Clustering of large multiple alignments supported with alternative
representations could well be performed. How can we represent a pattern of residues as found in
a multiple alignment? And how can we use such a pattern to search for it in other protein
sequences? The formalism devised to describe the kind of patterns we need: is regular
expressions to describe particular languages in restricted cases.
The selection and employment of algorithms constitute the major issue when we are
searching large databases. For example, a database of size 10 9 , one can not run DP algorithm to
query a string of length up to 500, because of exponential running times. However, this problem
can be handled in different ways: (a) Implementing the DP algorithms in hardware, thus
executing them much faster. The disadvantage is its high cost. Furthermore, by using parallel
hardware, the problem can be distributed efficiently to a couple of thousands of processors, and
the results can be integrated later. This approach is costly, too. (b) Using heuristics that work
much faster than the original DP algorithms and exact algorithms. Here are some measures to
take: due to the huge DB size, Preprocessing of the rather stable portions of database is done;
Substitutions are much more likely than insertions and deletions; We expect homologous
sequences to contain a lot of segments with matches or substitutions, but without insertions and
deletions and gaps. These segments can be used as starting points for further searching. [4].
Learning algorithms of artificial neural networks supported with uncertainty, probabilities,
fuzziness, heuristics could be utilised. So that learning mechanism can steer the running of the
Search WWH ::




Custom Search