Mining Spatial Association Rules for Composite Motif Discovery - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

It is noteworthy that if P 1 P 2 and P 1 is not frequent (s u p.P 1 ;D/< min Œl),

then also P 2 is not frequent (s u p.P 2 ;D/< min Œl). This monotonicity property of

with respect to the support allows for pruning the search space without losing

frequent atomsets.

In the inter-level search, atomsets discovered at level l are refined by descending

the generalization hierarchies up to finding task-relevant objects mapped at level

l C 1. These are the only candidate atomsets considered for evaluation, since other

candidates would not meet the necessary condition for atomsets to be frequent at

level l C 1 when min Œl C 1 min Œl (see Definition 5.4 ). This way, the search

space at level l C 1 is heavily pruned. Moreover, information on the units of analysis

covered by atomsets at level l can be used to make more efficient the evaluation of

the support of atomsets at level l C 1. Indeed, if a unit of analysis DŒs is not covered

by a pattern P at granularity level l, then it will not be covered by any descendant

of P at level l C 1.

Once frequent atomsets have been generated at level l, it is possible to generate

strong spatial association rules, i.e., rules whose confidence is higher than a thresh-

old min Œl. In particular, each frequent atomset P at level l is partitioned into two

atomsets A and C such that P D A ^ C and the confidence of the association rule

A ) C is computed. Different partitions of P generate different association rules.

Those association rules with confidence lower than min Œl are filtered out.

We conclude by observing that in real-world applications a large number of fre-

quent atomsets and strong association rules can be generated, most of which are

uninteresting. This is also true for the module discovery problem (e.g., constituent

motifs with a large inter-motif distance). To prevent this, some pattern constraints

can be expressed in a declarative form and then used to filter out uninteresting atom-

sets or spatial association rules [ 4 ].

5.4

Implementation

The development of a module discovery tool effectively usable by biologists de-

mands for the solution of several problems, both methodological and architectural.

Methodological problems involve data pre-processing, namely discretization of nu-

merical data, and the automated selection of some critical parameters such as

minimum support. Architectural problems concern the interface of the tool with

the external world, either to acquire data and parameters or to communicate results.

In this section, solutions to these problems are briefly reported.

5.4.1

Choosing the Minimum Support Threshold

Setting up the minimum support threshold min is not a trivial problem for a biolo-

gist when assuming no a priori knowledge about structural and functional features

Search WWH ::

Custom Search

Home