Information Technology Reference
In-Depth Information
Fig. 1.2 Illustration of sequence-sequence, sequence-profile and prole-prole comparison
methods for homology detection and fold recognition. A node represents a protein and the distance
between two nodes represents their closeness. The large circles in red, blue and green indicate
three different protein families with similar fold. In this figure, two proteins marked with 1 belong
to the same protein family, so their homologous relationship can be detected through sequence-
sequence comparison. Two proteins marked with 2 are not in the same protein family, but they are
still evolutionary related and their homologous relationship can be recognized by sequence-profile
comparison. Two proteins marked with 3 are distantly-related with similar folds, and their
relationship may be recognized by pro le-pro le comparison
Sequence-sequence or pure sequence-based methods detect homologs by mainly
aligning two primary sequences. They are good for close but not remote homology
detection. Existing sequence-based methods mainly differ in alignment algorithms,
amino acid mutation score and gap penalty. Some methods such as the Needleman-
Wunsch [ 35 ] and Smith-Waterman algorithms [ 36 ] employ dynamic programming
to build alignments, while others such as BLAST [ 37 ] and FASTA [ 38 ] use more
ef
cient heuristic-based alignment algorithms. BLOSUM [ 39 ] and PAM [ 39 ] are
two widely-used amino acid substitution matrices to score similarity of two aligned
residues. An af
ne function is used to penalize gaps (i.e., unaligned residues) in an
alignment.
Alignment-based homology detection can be improved by using evolutionary
information such as PSI-BLAST sequence pro
le Hidden Markov
Model (HMM) [ 6 ]. A few methods have been developed to align one primary
sequence to one sequence pro
le [ 37 ] or pro
les. For example,
HMMER [ 40 ] and SAM [ 41 ] are two tools that align one primary sequence to one
le or align two sequence pro
Search WWH ::




Custom Search