Databases Reference
In-Depth Information
Tabl e 1 . Subspaces with varied density
abcdef
1 000001
2 000001
3 000011
4 000111
5 000111
6 000111
7 000110
8 000100
9 111000
10111000
However, patterns with higher dimensionality tend to have less frequencies,
so using the same threshold value for all patterns risks losing patterns in higher
dimensional spaces. Furthermore, patterns with the same dimensionality may
need different frequency threshold values for various reasons. For example,
a pattern with higher frequency in very dense dimensions may not be as
informative and interesting as a pattern with lower frequency in very sparse
dimensions. Setting a relatively high frequency threshold tends to bias the
search algorithm to favor patterns in dense subspaces only, while patterns in
less dense subspaces are neglected. Consider the example shown in Table 1.
Each column denotes one of the six attributes ( a,b,c,d,e,f ), and each row
denotes one object (data point). An entry '1' in row i and column j denotes
that object i has attribute j . There is a pattern in subspace
{
abc
}
that contains
two instances
{
9 , 10
}
, and subspace
{
def
}
has another pattern containing
three instances
. If we set the minimum frequency threshold to be
3, we lose the pattern in
{
4 , 5 , 6
}
{
abc
}
. However, this pattern in
{
abc
}
maybe more
interesting than the one in
, considering the fact that the number of
'1's in attributes a,b,c is much smaller than in attributes d,e,f . Actually,
all instances that have entry '1' in a also have entry '1' in b and c , and this
may suggest a strong correlation between a,b,c , and also a strong correlation
between instances 9 and 10. On the other hand, although the pattern in
{
def
}
}
has a larger frequency, it does not suggest such strong correlations either
between attributes d,e,f or between instances 4-6. So we suggest that smaller
frequency threshold should be chosen for subspaces with lower densities, that
is, subspaces with less number of '1' entries.
We propose a weighted density measure in this chapter, which captures
the requirement to use a smaller density threshold for less dense subspaces.
And we present an e cient search algorithm to find all patterns satisfying a
minimum weighted density threshold.
Most algorithms for finding closed patters report only the dimensions in
which the patterns occur, without explicitly listing all the objects that are
contained in the patterns. However, the object space of the patterns is crucial
{
def
 
Search WWH ::




Custom Search