Biomedical Engineering Reference
In-Depth Information
It is noteworthy that if P 1 P 2 and P 1 is not frequent (s u p.P 1 ;D/< min Œl),
then also P 2 is not frequent (s u p.P 2 ;D/< min Œl). This monotonicity property of
with respect to the support allows for pruning the search space without losing
frequent atomsets.
In the inter-level search, atomsets discovered at level l are refined by descending
the generalization hierarchies up to finding task-relevant objects mapped at level
l C 1. These are the only candidate atomsets considered for evaluation, since other
candidates would not meet the necessary condition for atomsets to be frequent at
level l C 1 when min Œl C 1 min Œl (see Definition 5.4 ). This way, the search
space at level l C 1 is heavily pruned. Moreover, information on the units of analysis
covered by atomsets at level l can be used to make more efficient the evaluation of
the support of atomsets at level l C 1. Indeed, if a unit of analysis DŒs is not covered
by a pattern P at granularity level l, then it will not be covered by any descendant
of P at level l C 1.
Once frequent atomsets have been generated at level l, it is possible to generate
strong spatial association rules, i.e., rules whose confidence is higher than a thresh-
old min Œl. In particular, each frequent atomset P at level l is partitioned into two
atomsets A and C such that P D A ^ C and the confidence of the association rule
A ) C is computed. Different partitions of P generate different association rules.
Those association rules with confidence lower than min Œl are filtered out.
We conclude by observing that in real-world applications a large number of fre-
quent atomsets and strong association rules can be generated, most of which are
uninteresting. This is also true for the module discovery problem (e.g., constituent
motifs with a large inter-motif distance). To prevent this, some pattern constraints
can be expressed in a declarative form and then used to filter out uninteresting atom-
sets or spatial association rules [ 4 ].
5.4
Implementation
The development of a module discovery tool effectively usable by biologists de-
mands for the solution of several problems, both methodological and architectural.
Methodological problems involve data pre-processing, namely discretization of nu-
merical data, and the automated selection of some critical parameters such as
minimum support. Architectural problems concern the interface of the tool with
the external world, either to acquire data and parameters or to communicate results.
In this section, solutions to these problems are briefly reported.
5.4.1
Choosing the Minimum Support Threshold
Setting up the minimum support threshold min is not a trivial problem for a biolo-
gist when assuming no a priori knowledge about structural and functional features
Search WWH ::




Custom Search