Databases Reference
In-Depth Information
An Algorithm for Mining Weighted Dense
Maximal 1-Complete Regions
Haiyun Bian and Raj Bhatnagar
Department of ECECS, University of Cincinnati, Cincinnati, OH 45221, USA
bianh@ececs.uc.edu, raj.bhatnagar@uc.edu
Summary. We propose a new search algorithm for a special type of subspace
clusters, called maximal 1-complete regions, from high dimensional binary valued
datasets. Our algorithm is suitable for dense datasets, where the number of maximal
1-complete regions is much larger than the number of objects in the datasets. Unlike
other algorithms that find clusters only in relatively dense subspaces, our algorithm
finds clusters in all subspaces. We introduce the concept of weighted density in or-
der to find interesting clusters in relatively sparse subspaces. Experimental results
show that our algorithm is very e cient, and uses much less memory than other
algorithms.
1 Introduction
Frequency has been used for finding interesting patterns in various data min-
ing problems, such as the minimum support threshold used in mining frequent
itemsets [2,3] and the minimum density defined in mining subspace clusters [1].
A priori-like algorithms [1] perform levelwise searches for all patterns having
enough frequencies (either support or density) starting from single dimen-
sions, and prune the search space based on the rationale that in order for a
k− dimensional pattern to be frequent, all its ( k− 1) − dimensional sub-patterns
must also be frequent. A large frequency threshold is usually set in most of the
algorithms to control the exponential growth of the search space as a function
of the highest dimensionality of the frequent patterns.
Closed patterns was proposed [7] to reduce the number of frequent patterns
being returned by the algorithm without losing any information. Mining closed
patterns is lossless in the sense that all frequent patterns can be inferred
from the set of closed patterns. Most algorithms proposed for mining closed
patterns require all candidates found so far to be kept in memory to avoid
duplicates [9, 11, 12]. These algorithms also require the minimum frequency
threshold value to be specified before the algorithms are run, and the same
value is used to prune off candidates for patterns in all subspaces.
Search WWH ::




Custom Search