Biology Reference
In-Depth Information
sequence selection procedure is based on optimizing a pairwise
distance-dependent interaction potential. Such a statistically based
empirical energy function assigns energy values for interactions
between amino acids in the protein based on the alpha-carbon separa-
tion distance for each pair of amino acids. Such structure-based
pairwise potentials are fast to evaluate, and have been used in fold
recognition and fold prediction [51]. One advantage of this approach is
that there is no need to derive empirical weights to account for individ-
ual residue propensities. Moreover, the possibility that such interaction
potentials lack sensitivity to local atomic structure is addressed within
the context of the overall two-stage approach. In fact, the coarser
nature of the energy function in the in silico sequence selection phase
may prove beneficial in that it allows for an inherent flexibility to the
backbone.
A number of different parameterizations for pairwise residue inter-
action potentials exist. The simplest approach is the development of a
binary version of the model such that each contact between two amino
acids is assigned according to the residues types and the requirement
that a contact is defined as the separation between the side chains of
two amino acids being less than 6.5 Å [52]. An improvement of this
model is based on the incorporation of distance dependence for the
energy of each amino acid interaction. Specifically, the alpha-carbon
distances are discretized into a set of 13 bins to create a finite number
of interactions, the parameters of which were derived from a linear
optimization formulated to favor native folds over optimized decoy
structures [53,54]. The use of a distance-dependent potential allows for
the implicit inclusion of side chains and the specificity of amino acids.
The resulting potential, which involves 2730 parameters, was shown to
provide higher Z scores than other potentials and place native folds
lower in energy [53,54].
The linearity of the resulting formulation based on this distance-
dependent interaction potential [55] is also an attractive characteristic
of the in silico sequence selection procedure. The development of the
formulation can be understood by first describing the variable set over
which the energy function is optimized. First, consider the set i
, n ,
which defines the number of residue positions along the backbone. At
each position i
=
1,
there can be a set of mutations represented by
j { i }
=
1,
, m i , where, for the general case, m i =
20
i . The equivalent sets
k i and l j are defined, and k
i is required to represent all unique
pairwise interactions. With this in mind, the binary variables and
can be introduced to indicate the possible mutations at a given posi-
tion. That is, the variable will indicate which type of amino acid is
active at a position in the sequence by taking the value of 1 for that
specification. Then, the formulation, for which the goal is to minimize
>
y i j
y l
y i j
Search WWH ::




Custom Search