An Algorithm for Mining Weighted Dense Maximal 1-Complete Regions - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

An Algorithm for Mining Weighted Dense

Maximal 1-Complete Regions

Haiyun Bian and Raj Bhatnagar

Department of ECECS, University of Cincinnati, Cincinnati, OH 45221, USA

bianh@ececs.uc.edu, raj.bhatnagar@uc.edu

Summary. We propose a new search algorithm for a special type of subspace

clusters, called maximal 1-complete regions, from high dimensional binary valued

datasets. Our algorithm is suitable for dense datasets, where the number of maximal

1-complete regions is much larger than the number of objects in the datasets. Unlike

other algorithms that find clusters only in relatively dense subspaces, our algorithm

finds clusters in all subspaces. We introduce the concept of weighted density in or-

der to find interesting clusters in relatively sparse subspaces. Experimental results

show that our algorithm is very e cient, and uses much less memory than other

algorithms.

1 Introduction

Frequency has been used for finding interesting patterns in various data min-

ing problems, such as the minimum support threshold used in mining frequent

itemsets [2,3] and the minimum density defined in mining subspace clusters [1].

A priori-like algorithms [1] perform levelwise searches for all patterns having

enough frequencies (either support or density) starting from single dimen-

sions, and prune the search space based on the rationale that in order for a

k− dimensional pattern to be frequent, all its ( k− 1) − dimensional sub-patterns

must also be frequent. A large frequency threshold is usually set in most of the

algorithms to control the exponential growth of the search space as a function

of the highest dimensionality of the frequent patterns.

Closed patterns was proposed [7] to reduce the number of frequent patterns

being returned by the algorithm without losing any information. Mining closed

patterns is lossless in the sense that all frequent patterns can be inferred

from the set of closed patterns. Most algorithms proposed for mining closed

patterns require all candidates found so far to be kept in memory to avoid

duplicates [9, 11, 12]. These algorithms also require the minimum frequency

threshold value to be specified before the algorithms are run, and the same

value is used to prune off candidates for patterns in all subspaces.

Search WWH ::

Custom Search

Home