An Algorithm for Mining Weighted Dense Maximal 1-Complete Regions - Data Mining: Foundations and Practice - page 44

Databases Reference

In-Depth Information

1

Mushroom

Chess

Cog

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

20

40

60

80

100

120

attribute id

Fig. 5. Density distribution for all attributes

certain threshold value. For example, the least dense attribute in COG has

density 0 . 11. If we want to find weighted dense clusters that have this least

dense attribute, the minimum density threshold must be set to be no larger

than 0 . 11. However, our tests show that for Chess and COG, CLOSET + runs

out of memory for these low threshold values. For Mushroom, CLOSET + can

finish the mining task for all threshold values.

Our algorithm uses almost the same amount of memory for all weighted

density threshold values, since the computation of the next cluster depends

only on the current cluster and not on any other previously found ones.

As shown in Fig. 6, our algorithm uses almost the same amount of mem-

ory for all weighted density threshold values for all datasets. Compared with

CLOSET +, our algorithms uses much less memory on Mushroom. For Chess

and COG, the difference is more significant as CLOSET + cannot finish the

task due to insu cient memory.

We also compared the running time of our algorithm with CLOSET +on

the Mushroom data. Since CLOSET + runs out of memory on Chess and Cog,

we only report the running time for our algorithm. In order to find weighted

dense clusters in the least dense subspaces, CLOSET + needs to find almost

all dense regions, which explains why its running time is almost constant for all

threshold values. Even if we want to find all the maximal 1-complete regions

in the data, our algorithm is still faster than CLOSET +.

Figure 9 shows the total number of clusters being found for various

weighted density threshold values. For all three datasets, the running time

curves as shown in Figs. 7 and 8 fit very well with the curves in Fig. 9. This

suggests that our algorithm has a linear time complexity with the number of

clusters being found.

Next Page

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home