A Data Mining Software Package Including Data Preparation and Reduction: KEEL - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

It contains a Knowledge Extraction Algorithms Library 15 with the incorporation

of multiple evolutionary learning algorithms, together with classical learning

approaches. The principal families of techniques included are:

- Evolutionary rule learning models . Including different paradigms of evolution-

ary learning.

- Fuzzy systems . Fuzzy rule learning models with a good trade-off between accu-

racy and interpretability.

- Evolutionary neural networks . Evolution and pruning in ANNs, product unit

ANNs, and RBFN models.

- Genetic programing . Evolutionary algorithms that use tree representations for

knowledge extraction.

- Subgroup discovery . Algorithms for extracting descriptive rules based on pat-

terns subgroup discovery.

- Data reduction ( instance and feature selection and discretization ). EAs for data

reduction.

KEEL integrates the library of algorithms in each of its function blocks. We have

briefly presented its function blocks above but in the following subsections, we will

describe the possibilities that KEEL offers in relation to data management, off-line

experiment design and on-line educational design.

10.2.2 Data Management

The fundamental purpose of data preparation is to manipulate and transform raw

data so that the information content enfolded in the data set can be exposed, or made

more accessible [ 19 ]. Data preparation comprises of those techniques concerned with

analyzing raw data so as to yield quality data, mainly including data collecting, data

integration, data transformation, data cleaning, data reduction and data discretization

[ 20 ]. Data preparation can be even more time consuming than DM, and can present

similar challenges. Its importance lies in that the real-world data is impure (incom-

plete, noisy and inconsistent) and high-performance mining systems require quality

data (the removal of anomalies or duplications). Quality data yields high-quality

patterns (to recover missing data, purify data and resolve conflicts).

The Data Management module integrated in KEEL allows us to perform the data

preparation stage independently of the remaining DM processes. This module is

focused on the group of users denoted as domain experts. They are familiar with

their data, they know the processes that produce the data and they are interested in

reviewing to improve them or analyze them. On the other hand, domain users are

those whose interests lies in applying processes to their own data and are usually not

experts in DM.

Search WWH ::

Custom Search

Home