Protein Fold Recognition Using Markov Logic Networks - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

problems in this area of machine learning is assigning labels to sequences of

objects. This class of problems has been called sequential supervised learning [ 7 ].

Probabilistic graphical models and in particular hidden Markov models (HMM)

have been quite successful in modeling sequential phenomena. However, the main

weaknesses for this model are: (1) It handles sequences of flat alphabets only (i.e.,

sequential objects have no structure) and (2) It is hard to express dependencies in

the input data. Recently, to overcome the first problem, the work in [ 16 ] introduced

logical hidden Markov models (LoHMM), an extension of HMM to handle se-

quences of logical atoms. However, the second problem still remains for LoHMM.

For this reason, conditional random fields (CRFs) [ 12 ] have been proposed. CRFs

are discriminatively trained graphical models instead of generatively trained such

as HMMs. CRFs can easily handle non-independent input features and represent

the conditional probability distribution P.Y j X/,whereX represents elements of

the input space and Y of the output space. For many tasks in computational biology,

information extraction or user modeling CRF have outperformed HMMs.

One of the problems where sequences exhibit internal structure is modeling

sequences of protein secondary structure. These sequences can be seen as se-

quences of logical atoms (details about logic can be found in [ 8 ]). For example, the

following sequence of the TIM beta/alpha-barrel protein represents a sequence of

logical atoms:

st . 0 SB 0 ; null ; medium /; st . 0 SB 0 ; plus ; medium /; he .h. right ; alpha /; long/;

st . 0 SB 0 ; plus ; medium /; he .h. right ; alpha /; medium /; :::

Helices and strands are represented, respectively, by he(type,length) and

st(orientation, length) . Traditional HMMs or CRFs would ignore the structure

of the symbols in the sequence loosing therefore the structure that each symbol

implies or would take into account all the possible combinations (of orientation and

length) into account that could lead to a combinatorial explosion of the number of

parameters.

The first approach to dealing with sequences of logical atoms by extending CRFs

is that of [ 10 ] where the authors propose TildeCRF that uses relational regres-

sion trees in the gradient tree boosting approach [ 7 ] to make relational abstraction

through logical variables and unification. The authors showed that TildeCRF out-

performed previous approaches based on LoHMMs such as [ 6 , 10 ].

Many real-world application domains are characterized by both uncertainty and

complex relational structure. Statistical learning focuses on the former, and rela-

tional learning on the latter. Statistical relational learning [ 9 ] aims at combining

the power of both. One of the representation formalisms in this area is Markov

Logic which subsumes both finite first-order logic and probabilistic graphical mod-

els as special cases [ 23 ]. Upon this formalism, Markov logic networks (MLNs) can

be built serving as templates for constructing Markov networks (MNs). In Markov

Logic a weight is attached to each clause and learning an MLN consists of struc-

ture learning (learning the clauses) and weight learning (setting the weight of each

clause).

Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Search WWH ::

Custom Search

Home