Mining Graph Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

account and preserves more structural information than simply counting occurrence

of features inside the window.

Calculating p value of a Feature Vector To calculate p value of a feature vector,

we model the occurrence of a feature vector x in a feature vector space formulated by

a random graph. The frequency distribution of a vector is generated using the prior

probabilities of features obtained empirically. Given a feature vector x

[ x 1 , ... , x n ],

the probability of x occurring in a random feature vector y

[ y 1 , ... , y n ] can be

expressed as a joint probability

P ( x )

P ( y 1 ≥

x 1 , ... , y n ≥

x n ) .

(13.11)

To simplify the calculation, we assume independence of the features. As a result,

Eq. ( 13.11 ) can be expressed as a product of the individual probabilities, where

P ( x )

P ( y i ≥

x i ) .

(13.12)

Once P ( x ) is known, the support of x in a database of random feature vectors

can be modeled as a binomial distribution. To illustrate, a random vector can be

viewed as a trial and x occurring in it as “success”. A database consisting m feature

vectors will involve m trials for x . The support of x in the database is the number of

successes. Therefore, the probability of x having a support μ is

C m P ( x ) μ (1

P ( x )) m − μ .

P ( x ; μ )

−

(13.13)

The probability distribution function (abbr. pdf) of x can be generated from

Eq. ( 13.13 ) by varying μ in the range [0, m ]. Therefore, given an observed sup-

port μ 0 of x , its p value can be calculated by measuring the area under the pdf in the

range [ μ 0 , m ], which is

p - value ( x , μ 0 )

P ( x ; i ) .

(13.14)

i = μ 0

Identifying Regions of Interest With the conversion of graphs into feature vectors,

and a model to evaluate significance of a graph region in the feature space, the next

step is to explore how the feature vectors can be analyzed to extract the significant

regions. Based on the feature vector representation, the presence of a “common”

sub-feature vector among a set of graphs points to a common subgraph. Similarly,

the absence of a “common” sub-feature vector indicates the non-existence of any

common subgraph. Mathematically, the floor of the feature vectors produces the

“common” sub-feature vector.

Definition 13.10

(Floor of Vectors) The floor of a set of vectors

{

v 1 , ... , v m }

a vector v f where v f i

1, ... , n, n is the number of

dimensions of a vector. Ceiling of a set of vectors is defined analogously.

min ( v 1 i , ... , v m i ) for i

Frequent Pattern Mining

Search WWH ::

Custom Search

Home