Biomedical Engineering Reference
In-Depth Information
of a given length. Conversely, a probabilistic framework is more expressive, since it
relaxes the hard constraints of discrete frameworks and associates each module with
a score which is a combination (e.g., the sum) of motifs and distance scores. Issues
of probabilistic frameworks are local optima and interpretability of results.
A recent assessment of eight published methods for module discovery [ 21 ]has
shown that no single method performed consistently better than others in all situa-
tions and that there are still advances to be made in computational module discovery.
In this chapter, we propose an innovative approach to module discovery, which can
be a useful supplement or alternative to other well-known approaches. The idea is
to mine rules which define “strong” spatial associations between single motifs [ 27 ].
Single motifs might either be de novo discovered by traditional discovery algorithms
or taken from databases of known motifs.
The spatial relationships considered in this work are the order of motifs along
the DNA sequence and the inter-motif distance between each consecutive couple of
motifs, although the mining method proposed to generate spatial association rules
has no limitation on both the number and the nature of spatial relationships. The as-
sociation rule mining method is based on an inductive logic programming (ILP) [ 31 ]
formulation according to which both data and discovered patterns are represented in
a first-order logic formalism. This formulation also facilitates the accommodation
of diverse sources of domain (or background) knowledge which are expressed in a
declarative way. Indeed, ILP is particularly well suited to bioinformatics tasks due
to its ability both to take into account background knowledge and to work directly
with structured data [ 30 ]. This is confirmed by some notable success in molecular
biology applications, such as predicting carcinogenesis [ 44 , 45 ].
The proposed approach is based on a discrete framework, which presents several
advantages, the most relevant being the straightforward interpretation of rules, but
also some disadvantages, such as the hard discretization of numerical inter-motif
distances or the choice of a minimum support threshold. To overcome these issues,
some computational solutions have been developed and tested.
The specific features of this approach are:
An original perspective of module discovery as a spatial association rule mining
task;
A logic-based approach where background knowledge can be expressed in a
declarative way;
A procedure for the automated selection of some parameters which are difficult
to properly set;
Some computational solutions to overcome the discretization issues of discrete
approaches.
These features provide our module discovery tool several advantages with
respect to competitive approaches. First, spatial association rules, which take
the form of A ) C, provide insight both into the support of the module (repre-
sented by A ^ C) and into the confidence of possible predictions of C given A.
Predictions may equally concern both properties of motifs (e.g., its type) and spa-
tial relationships (e.g., the inter-motif distance). Second, the declarative knowledge
Search WWH ::




Custom Search