Mining Spatial Association Rules for Composite Motif Discovery - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

ground atoms to be deduced from data stored in D E . For instance, rules in ( 5.3 )

entail the following information from the set of Datalog ground atoms in ( 5.2 ):

( short medium distance .m 1 ;m 2 /;

short medium distance .m 2 ;m 3 /:

)

(5.4)

SPADA adds these entailed Datalog ground atoms to set ( 5.2 ), so that atoms with

the predicate short medium distance can also appear in mined association rules.

Spatial association rules discovered by SPADA take the form A ) C,where

both A and C are conjunctions of Datalog non-ground atoms . A Datalog ground

atom is an n-ary predicate symbol applied to n terms (either constants or variables),

at least one of which is a variable. For each association rule, there is exactly one

variable denoting the whole sequence and other variables denoting constituent mo-

tifs. An example of a spatial association rule is the following:

sequence .T/; part of .T;M 1 /; is a.M 1 ;x/; distance .M 1 ;M 2 ; short /;

M 1 ¤ M 2 ) is a.M 2 ;y/

(5.5)

where variable T denotes a sequence, while variables M 1 and M 2 denote two dis-

tinct occurrences of single motifs (M 1 ¤ M 2 ) of type x and y, respectively. With

reference to the sequence described in ( 5.2 ), T corresponds to t 2 while the two

distinct occurrences of single motifs M 1 and M 2 correspond to m 1 and m 2 , respec-

tively. By means of this association rule, it is possible to infer which is the single

motif that follows in a short distance a single motif x. The uncertainty of the infer-

ence is quantified by the confidence of the association rule.

Details on the association rule discovery algorithm implemented in SPADA are

reported in the next section.

5.3

SPADA: Pattern Space and Search Procedure

In SPADA, the set O of spatial objects is partitioned into a set S of reference (or

target) objects and m sets R k , 1 k m,of task-relevant (or non-target) objects.

Reference objects are the main subject of analysis and contribute to the computation

of the support of a pattern, while task-relevant objects are related to the reference

objects and contribute to accounting for the variation, i.e., they can be involved in

a pattern. In the sequence described in ( 5.2 ), the constant t 2 denotes a reference

object, while the constants m 1 , m 2 and m 3 denote three task relevant objects. In this

case, there is only one set R 1 of task-relevant objects.

SPADA is the only ILP system which addresses the task of relational frequent

pattern discovery by dealing properly with concept hierarchies. Indeed, for each set

R k , a generalization hierarchy H k

is defined together with a function k ,which

maps objects in H k

into a set of granularity levels f 1;:::;L g . For instance, with

Search WWH ::

Custom Search

Home