Databases Reference
In-Depth Information
1
Mushroom
Chess
Cog
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
20
40
60
80
100
120
attribute id
Fig. 5. Density distribution for all attributes
certain threshold value. For example, the least dense attribute in COG has
density 0 . 11. If we want to find weighted dense clusters that have this least
dense attribute, the minimum density threshold must be set to be no larger
than 0 . 11. However, our tests show that for Chess and COG, CLOSET + runs
out of memory for these low threshold values. For Mushroom, CLOSET + can
finish the mining task for all threshold values.
Our algorithm uses almost the same amount of memory for all weighted
density threshold values, since the computation of the next cluster depends
only on the current cluster and not on any other previously found ones.
As shown in Fig. 6, our algorithm uses almost the same amount of mem-
ory for all weighted density threshold values for all datasets. Compared with
CLOSET +, our algorithms uses much less memory on Mushroom. For Chess
and COG, the difference is more significant as CLOSET + cannot finish the
task due to insu cient memory.
We also compared the running time of our algorithm with CLOSET +on
the Mushroom data. Since CLOSET + runs out of memory on Chess and Cog,
we only report the running time for our algorithm. In order to find weighted
dense clusters in the least dense subspaces, CLOSET + needs to find almost
all dense regions, which explains why its running time is almost constant for all
threshold values. Even if we want to find all the maximal 1-complete regions
in the data, our algorithm is still faster than CLOSET +.
Figure 9 shows the total number of clusters being found for various
weighted density threshold values. For all three datasets, the running time
curves as shown in Figs. 7 and 8 fit very well with the curves in Fig. 9. This
suggests that our algorithm has a linear time complexity with the number of
clusters being found.
 
Search WWH ::




Custom Search