Protein Fold Recognition Using Markov Logic Networks - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

a cluster of solutions. A cluster of solutions is usually a set of connected solutions,

so that any two solutions within the cluster can be connected through a series of flips

without leaving the cluster. In many domains of interest, solutions exist in clusters

and it is highly useful to explore such clusters without leaving them. SA has good

properties in exploring a connected space; therefore, it samples near-uniformly and

often explores all the neighboring solutions.

Through MC-IRoTS, we can perform conditional inference given evidence to

compute probabilities for query predicates. These probabilities can be used to make

predictions from the model.

4.6.2

Discriminative Learning by Sampling with MC-IRoTS

Discriminative approaches to weight learning try to optimize the CLL. Precon-

ditioned scaled conjugate gradient (PSCG) is the state-of-the-art discriminative

training algorithm for MLN and it was shown in [ 15 ] to outperform the voted per-

ceptron. PSCG is a conjugate gradient method that uses samples from MC-SAT

to approximate the Hessian for MLNs instead of the line search to choose a step

size. This approach is also known as scaled conjugate gradient and was originally

proposed in [ 20 ] for training neural networks. PSCG, in each iteration, takes a step

in the diagonalized Newton direction (for details, see [ 15 ]). Here, we propose to

use MC-IRoTS to sample for approximating the Hessian for SMLNs. The goal is

to use samples from MC-IRoTS that can serve as good estimates for computing the

Hessian.

4.7

Modeling Protein Sequences in SMLNs

In this section, we describe how sequences of protein secondary structure can be

modeled in SMLNs, how to learn model parameters from the data, and how to make

predictions from the model.

4.7.1

Model Construction and Weight Learning

The approach we follow is quite simple: we write a few formulas that represent the

structure of the domain and then from the training sequences we learn the weights

of these formulas.

The dataset we refer to is that used in [ 10 ]. The data consist of logical sequences

of the secondary structure of protein domains:

beginSequence :

strand . 0 SB 0 ; null ; medium /:

Search WWH ::

Custom Search

Home