Introduction - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

Chapter 1

Introduction

Abstract This chapter describes background and surveys existing popular methods

on homology detection and fold recognition. In particular, this chapter reviews

homology detection methods from the following perspectives: alignment-free ver-

sus alignment-based, sequence-based versus pro

le-based, and generative versus

discriminative machine learning. Finally, this chapter also reviews a few popular

scoring functions for sequence-based or pro

le-based protein alignment.

Keywords Homology detection

Fold recognition

Alignment-free homology

detection

Alignment-based homology detection

Pro

le-based protein alignment

1.1 Background

High-throughput genome sequencing has been yielding a large number of biolog-

ical sequences without accurate functional and structural annotations [ 1 , 2 ]. Due to

experimental complications and obstacles in structural and functional analysis, the

gap between the number of available protein sequences and the number of proteins

with experimentally determined structures and functions has greatly increased in

recent years [ 3 , 4 ]. As such, novel bioinformatics methods that can link proteins

without annotations to their homologs with accurate annotations are needed.

However, a large percentage of proteins have no solved structures, so it is important

to unravel protein relationship using sequence information. Meanwhile, homology

detection and fold recognition are two essential techniques used to detect if two

proteins are homologous or share similar folds [ 5 , 6 ].

Two proteins are said to be homologous if they share a common evolutionary

origin. Sequence information is often used to infer if proteins are homologous or

not and their structure and functional relationship. If two proteins share high

sequence similarity, say above 40 % sequence identity [ 7 , 8 ], they are very likely to

be homologous and have similar structures and in many cases also similar func-

tions. It is observed that proteins sharing low sequence identity may still be

remotely homologous. Nevertheless, homology detection is very challenging when

Search WWH ::

Custom Search

Home