Database Reference
In-Depth Information
Mining MOUCLAS Patterns and Jumping MOUCLAS
Patterns to Construct Classifiers
Yalei Hao 1 , Gerald Quirchmayr 1,2 , and Markus Stumptner 1
1 Advanced Computing Research Centre, University of South Australia,
SA5095, Australia
2 Institut für Informatik und Wirtschaftsinformatik, Universität Wien,
Liebiggasse 4, A-1010 Wien, Austria
Yalei.Hao@postgrads.unisa.edu.au,
Gerald.Quirchmayr@unisa.edu.au, mst@cs.unisa.edu.au
Abstract. This paper proposes a mining novel approach which consists of two
new data mining algorithms for the classification over quantitative data, based
on two new pattern called MOUCLAS (MOUntain function based
CLASsification) Patterns and Jumping MOUCLAS Patterns. The motivation of
the study is to develop two classifiers for quantitative attributes by the concepts
of the association rule and the clustering. An illustration of using petroleum
well logging data for oil/gas formation identification is presented in the paper.
MPs and JMPs are ideally suitable to derive the implicit relationship between
measured values (well logging data) and properties to be predicted (oil/gas
formation or not). As a hybrid of classification and clustering and association
rules mining, our approach have several advantages which are (1) it has a solid
mathematical foundation and compact mathematical description of classifiers,
(2) it does not require discretization, (3) it is robust when handling noisy or
incomplete data in high dimensional data space.
1 Introduction
Data mining based classification aims to build accurate and efficient classifiers not
only on small data sets but more importantly also on large and high dimensional data
sets, while the widely used traditional statistical data analysis techniques are not
sufficiently powerful for this task 1, 2 . With the development of new data mining
techniques on association rules, new classification approaches based on concepts from
association rule mining are emerging. These include such classifiers as ARCS 3 ,
CBA 4 , LB 5 , JEP 6 , etc., which are different from the classic decision tree based
classifier C4.5 7 and k-nearest neighbor 8 in both the learning and testing phases. To
improve ARCS 3 , A non-grid-based technique 9 has been further proposed to find
quantitative association rules that can have more than two predicates in the
antecedent. All the above algorithms are constrained by the framework of binning.
Though several excellent discretization algorithms 10, 11 are proposed, a standard
approach to discretization has not yet been developed.
 
Search WWH ::




Custom Search