Mining MOUCLAS Patterns and Jumping MOUCLAS Patterns to Construct Classifiers - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Mining MOUCLAS Patterns and Jumping MOUCLAS

Patterns to Construct Classifiers

Yalei Hao 1 , Gerald Quirchmayr 1,2 , and Markus Stumptner 1

1 Advanced Computing Research Centre, University of South Australia,

SA5095, Australia

2 Institut für Informatik und Wirtschaftsinformatik, Universität Wien,

Liebiggasse 4, A-1010 Wien, Austria

Yalei.Hao@postgrads.unisa.edu.au,

Gerald.Quirchmayr@unisa.edu.au, mst@cs.unisa.edu.au

Abstract. This paper proposes a mining novel approach which consists of two

new data mining algorithms for the classification over quantitative data, based

on two new pattern called MOUCLAS (MOUntain function based

CLASsification) Patterns and Jumping MOUCLAS Patterns. The motivation of

the study is to develop two classifiers for quantitative attributes by the concepts

of the association rule and the clustering. An illustration of using petroleum

well logging data for oil/gas formation identification is presented in the paper.

MPs and JMPs are ideally suitable to derive the implicit relationship between

measured values (well logging data) and properties to be predicted (oil/gas

formation or not). As a hybrid of classification and clustering and association

rules mining, our approach have several advantages which are (1) it has a solid

mathematical foundation and compact mathematical description of classifiers,

(2) it does not require discretization, (3) it is robust when handling noisy or

incomplete data in high dimensional data space.

1 Introduction

Data mining based classification aims to build accurate and efficient classifiers not

only on small data sets but more importantly also on large and high dimensional data

sets, while the widely used traditional statistical data analysis techniques are not

sufficiently powerful for this task 1, 2 . With the development of new data mining

techniques on association rules, new classification approaches based on concepts from

association rule mining are emerging. These include such classifiers as ARCS 3 ,

CBA 4 , LB 5 , JEP 6 , etc., which are different from the classic decision tree based

classifier C4.5 7 and k-nearest neighbor 8 in both the learning and testing phases. To

improve ARCS 3 , A non-grid-based technique 9 has been further proposed to find

quantitative association rules that can have more than two predicates in the

antecedent. All the above algorithms are constrained by the framework of binning.

Though several excellent discretization algorithms 10, 11 are proposed, a standard

approach to discretization has not yet been developed.

Search WWH ::

Custom Search

Home