Mining MOUCLAS Patterns and Jumping MOUCLAS Patterns to Construct Classifiers - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

In the first sub-step, we reduce the dimensionality of transactions in order to

enhance the quality of data mining and decrease the computational cost of the

MOUCLAS algorithm. Since, for attributes A j , j = 1 to l in database, D , an

exhaustive search for the optimal subset of attributes within 2 l possible subsets can

be prohibitively expensive, especially in high dimensional databases, we use

heuristic methods to reduce the search space. Such greedy methods are effective in

practice, and include such techniques as stepwise forward selection, stepwise

backward elimination, combination of forwards selection and backward elimination,

etc. The first sub-step is particularly important when dealing with raw data sets.

Detailed methods concerning dimensionality reduction can be found in some

papers 15-18 .

Fuzzy based clustering is performed in the second sub-step to find the clusters of

quantitative data. The Mountain-climb technique proposed by R. R. Yager and D. P.

Filev 19 employed the concept of a mountain function, a fuzzy set membership

function, in determining cluster centers used to initialize a Neuro-Fuzzy system.

The substractive clustering technique 20 was defined as an improvement of

Mountain-climb clustering. A similar approach is provided by the DENCLUE

algorithm 21 , which is especially efficient for clustering on high dimensional

databases with noise. The techniques of Mountain-climb clustering, Substractive

clustering and Denclue provide an effective way of dealing with quantitative

attributes by mountain functions (or influence functions), which has a solid

mathematical foundation and compact mathematical description and is totally

different from the traditional processing method of binning. It offers us an

opportunity of mining the patterns of data from an innovative angle. As a result,

part of the research task presented in the introduction can now be favorably

answered.

The observation that, a region which is dense in a particular subspace must create

dense regions when projected onto lower dimensional subspaces, has been proved by

R. Agrawal and his research cooperators in CLIQUE 22 . In other words, the

observation follows the concepts of the apriori property. Hence, we may employ prior

knowledge of items in the search space based on the property so that portions of the

space can be pruned. The successful performance of CLIQUE has again proved the

feasibility of applying the concept of apriori property to clustering. It brings us a step

further towards the solution of the rest part of the research task, that is, if the initial

association rules can be agglomerated into clustering regions, just like the condition in

ARCS, we may be able to design a new classifier for the purpose of classification,

which confines its search for the classifier to the cluster of dense units of high

dimensional space. The answer to the rest research task can contribute to the third

sub-step of the MOUCLAS algorithm to the forming of the antecedent of

cluster_rules , with any number of predicates in the antecedent. In the third sub-step,

we identify the candidate cluster_rules which are actually frequent and accurate and

reliable . From this set of frequent and accurate and reliable cluster_rules , we

produce a set of MPs .

Let I be the set of all items in D , C be the dataset D after dimensionality reduction,

where transaction d

∈

C contains X

⊆

I , a k- itemset. Let E denote the set of candidates

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home