Database Reference
In-Depth Information
In the first sub-step, we reduce the dimensionality of transactions in order to
enhance the quality of data mining and decrease the computational cost of the
MOUCLAS algorithm. Since, for attributes A j , j = 1 to l in database, D , an
exhaustive search for the optimal subset of attributes within 2 l possible subsets can
be prohibitively expensive, especially in high dimensional databases, we use
heuristic methods to reduce the search space. Such greedy methods are effective in
practice, and include such techniques as stepwise forward selection, stepwise
backward elimination, combination of forwards selection and backward elimination,
etc. The first sub-step is particularly important when dealing with raw data sets.
Detailed methods concerning dimensionality reduction can be found in some
papers 15-18 .
Fuzzy based clustering is performed in the second sub-step to find the clusters of
quantitative data. The Mountain-climb technique proposed by R. R. Yager and D. P.
Filev 19 employed the concept of a mountain function, a fuzzy set membership
function, in determining cluster centers used to initialize a Neuro-Fuzzy system.
The substractive clustering technique 20 was defined as an improvement of
Mountain-climb clustering. A similar approach is provided by the DENCLUE
algorithm 21 , which is especially efficient for clustering on high dimensional
databases with noise. The techniques of Mountain-climb clustering, Substractive
clustering and Denclue provide an effective way of dealing with quantitative
attributes by mountain functions (or influence functions), which has a solid
mathematical foundation and compact mathematical description and is totally
different from the traditional processing method of binning. It offers us an
opportunity of mining the patterns of data from an innovative angle. As a result,
part of the research task presented in the introduction can now be favorably
answered.
The observation that, a region which is dense in a particular subspace must create
dense regions when projected onto lower dimensional subspaces, has been proved by
R. Agrawal and his research cooperators in CLIQUE 22 . In other words, the
observation follows the concepts of the apriori property. Hence, we may employ prior
knowledge of items in the search space based on the property so that portions of the
space can be pruned. The successful performance of CLIQUE has again proved the
feasibility of applying the concept of apriori property to clustering. It brings us a step
further towards the solution of the rest part of the research task, that is, if the initial
association rules can be agglomerated into clustering regions, just like the condition in
ARCS, we may be able to design a new classifier for the purpose of classification,
which confines its search for the classifier to the cluster of dense units of high
dimensional space. The answer to the rest research task can contribute to the third
sub-step of the MOUCLAS algorithm to the forming of the antecedent of
cluster_rules , with any number of predicates in the antecedent. In the third sub-step,
we identify the candidate cluster_rules which are actually frequent and accurate and
reliable . From this set of frequent and accurate and reliable cluster_rules , we
produce a set of MPs .
Let I be the set of all items in D , C be the dataset D after dimensionality reduction,
where transaction d
C contains X
I , a k- itemset. Let E denote the set of candidates
Search WWH ::




Custom Search