Database Reference
In-Depth Information
C ← A 1 & A 2 & . . . & A n
Although, several types of relationships be-
tween amino-acid residues could have been stud-
ied, in this work we focused on the hydrophobic
residues. The hydrophobic effect is considered
to be one of the major driving forces in protein
folding (Dill, 1990; Kyte, 2003; Lins & Brasseur,
1995; Pace, 1996). It arises from entropically
unfavourable arrangements where non-polar side
chains contact water, thus favouring polypeptide
arrangements in which the side chains of hydro-
phobic amino-acids are packed in the interior of
the protein. In fact, about 80% of the hydrophobic
residues' side chains are buried inside a protein
when it folds (Pace, 1996). Thus, hydrophobic
residues usually exhibit small values of solvent
exposure (below 25%) in the protein's folded
state. We set out to find groups of residues, in
particular hydrophobic ones, which change sol-
vent exposure in a coordinated fashion during one
unfolding simulation or across several unfolding
simulations, which might be important in defining
folding nuclei for a protein (Brito, 2004; Ham-
marström & Carlsson, 2000). For each data set,
association rules were extracted such that only
hydrophobic residues with SASA values ≤ 25%
were involved. Because interactions between
hydrophobic groups are weak, it was imposed
that association rules should involve a minimum
of four residues. Association rules were extracted
with minimum support of 30% and minimum
confidence of 90%.
An association rule is a pair of disjoint itemsets
(set of items): the antecedents (A 1 , A 2 , ..., A n ), and
the consequent (C). In general, the consequent
may be a set of items but here we only consider
rules with single item consequents. In the specific
problem of SASA data analysis an item is repre-
sented by the pair residue/SASA. Each association
rule is associated with two values expressing its
degree of uncertainty. The first value is called the
support for the rule, and represents the frequency
of co-occurrence of all items appearing in the rule.
The second value is the confidence of the rule that
represents its accuracy. Confidence is calculated
as the ratio between the support of the rule and
the support of the antecedent.
Finding relations between amino-acid residues
belonging to the same and/or different chemical
classes is of great interest in the understanding of
the protein folding problem. In the present work,
the amino-acids were divided in five different
classes (hydrophobic, hydrophilic, polar with
positive charge, polar with negative charge and
aromatic), and association rules were extracted
among the five classes to study relationships linked
to the main forces driving the folding process: (i)
association rules among hydrophobic residues,
(ii) association rules among hydrophilic and hy-
drophobic residues, (iii) association rules among
aromatic residues, and (iv) association rules among
polar charged residues. Rules were extracted us-
ing CAREN (Azevedo, 2003). CAREN is a Java
based implementation of an association rule engine
that uses a new variant of the ECLAT algorithm
(Zaki, 2000). Several features for rule derivation
and selection are available in CAREN, namely
antecedent and consequent filtering by item or
attribute specification, minimum and maximum
number of items in a rule, and different metrics.
The χ 2 test is one of such metrics. It was applied
during itemset mining as it significantly reduces
the number of relevant itemsets.
reSultS
Here, we report and compare the results obtained
by the application of two data mining techniques
- hierarchical clustering and association rules -
to the analysis of solvent accessible surface area
(SASA) variation profiles of individual amino-
acid residues of the protein transthyretin across
five molecular dynamics unfolding simulations.
Search WWH ::




Custom Search