Information Technology Reference
In-Depth Information
Chapter 1
Introduction
Abstract This chapter describes background and surveys existing popular methods
on homology detection and fold recognition. In particular, this chapter reviews
homology detection methods from the following perspectives: alignment-free ver-
sus alignment-based, sequence-based versus pro
le-based, and generative versus
discriminative machine learning. Finally, this chapter also reviews a few popular
scoring functions for sequence-based or pro
le-based protein alignment.
Keywords Homology detection
Fold recognition
Alignment-free homology
detection
Alignment-based homology detection
Pro
le-based protein alignment
1.1 Background
High-throughput genome sequencing has been yielding a large number of biolog-
ical sequences without accurate functional and structural annotations [ 1 , 2 ]. Due to
experimental complications and obstacles in structural and functional analysis, the
gap between the number of available protein sequences and the number of proteins
with experimentally determined structures and functions has greatly increased in
recent years [ 3 , 4 ]. As such, novel bioinformatics methods that can link proteins
without annotations to their homologs with accurate annotations are needed.
However, a large percentage of proteins have no solved structures, so it is important
to unravel protein relationship using sequence information. Meanwhile, homology
detection and fold recognition are two essential techniques used to detect if two
proteins are homologous or share similar folds [ 5 , 6 ].
Two proteins are said to be homologous if they share a common evolutionary
origin. Sequence information is often used to infer if proteins are homologous or
not and their structure and functional relationship. If two proteins share high
sequence similarity, say above 40 % sequence identity [ 7 , 8 ], they are very likely to
be homologous and have similar structures and in many cases also similar func-
tions. It is observed that proteins sharing low sequence identity may still be
remotely homologous. Nevertheless, homology detection is very challenging when
Search WWH ::




Custom Search