Database Reference
In-Depth Information
account and preserves more structural information than simply counting occurrence
of features inside the window.
Calculating p value of a Feature Vector To calculate p value of a feature vector,
we model the occurrence of a feature vector x in a feature vector space formulated by
a random graph. The frequency distribution of a vector is generated using the prior
probabilities of features obtained empirically. Given a feature vector x
=
[ x 1 , ... , x n ],
the probability of x occurring in a random feature vector y
=
[ y 1 , ... , y n ] can be
expressed as a joint probability
P ( x )
=
P ( y 1
x 1 , ... , y n
x n ) .
(13.11)
To simplify the calculation, we assume independence of the features. As a result,
Eq. ( 13.11 ) can be expressed as a product of the individual probabilities, where
n
P ( x )
=
P ( y i
x i ) .
(13.12)
i
=
1
Once P ( x ) is known, the support of x in a database of random feature vectors
can be modeled as a binomial distribution. To illustrate, a random vector can be
viewed as a trial and x occurring in it as “success”. A database consisting m feature
vectors will involve m trials for x . The support of x in the database is the number of
successes. Therefore, the probability of x having a support μ is
C m P ( x ) μ (1
P ( x )) m μ .
P ( x ; μ )
=
(13.13)
The probability distribution function (abbr. pdf) of x can be generated from
Eq. ( 13.13 ) by varying μ in the range [0, m ]. Therefore, given an observed sup-
port μ 0 of x , its p value can be calculated by measuring the area under the pdf in the
range [ μ 0 , m ], which is
m
p - value ( x , μ 0 )
=
P ( x ; i ) .
(13.14)
i = μ 0
Identifying Regions of Interest With the conversion of graphs into feature vectors,
and a model to evaluate significance of a graph region in the feature space, the next
step is to explore how the feature vectors can be analyzed to extract the significant
regions. Based on the feature vector representation, the presence of a “common”
sub-feature vector among a set of graphs points to a common subgraph. Similarly,
the absence of a “common” sub-feature vector indicates the non-existence of any
common subgraph. Mathematically, the floor of the feature vectors produces the
“common” sub-feature vector.
Definition 13.10
(Floor of Vectors) The floor of a set of vectors
{
v 1 , ... , v m }
is
a vector v f where v f i
1, ... , n, n is the number of
dimensions of a vector. Ceiling of a set of vectors is defined analogously.
=
min ( v 1 i , ... , v m i ) for i
=
Search WWH ::




Custom Search