A Data Mining Software Package Including Data Preparation and Reduction: KEEL - Data Preprocessing in Data Mining - page 300

Graphics Reference

In-Depth Information

Table 10.4 Parameter' values employed in the experimental study

Algorithm

Parameters

Ant-Miner

Number of ants: 3000, Maximum uncovered samples: 10, Maximum samples

by rule: 10

Maximum iterations without converge: 10

CORE

Population size: 100, Co-population size: 50, Generation limit: 100

Number of co-populations: 15, Crossover rate: 1.0

Mutation probability: 0.1, Regeneration probability: 0.5

HIDER

Population size: 100, Number of generations: 100, Mutation probability: 0.5

Cross percent: 80, Extreme mutation probability: 0.05, Prune examples

factor: 0.05

Penalty factor: 1, Error coefficient: 1

SGERD

Number of Q rules per class: Computed heuristically, Rule evaluation criteria

=2

TARGET

Probability of splitting a node: 0.5, Number of total generations for the GA:

100

Number of trees generated by crossover: 30, Number of trees generated by

mutation: 10

Number of trees generated by clonation: 5, Number of trees Generated by

immigration: 5

five learning methods used (Clas-AntMiner, Clas-SGERD, Clas-Target, Clas-Hider

and Clas-CORE).

After the models are trained, the instances of the data set are classified. These

results are the inputs for the visualization and test modules. The module Vis-Clas-

Tabular receives these results as input and generates output files with several perfor-

mance metrics computed from them, such as confusion matrices for each method,

accuracy and error percentages for each method, fold and class, and a final summary

of results. Figure 10.9 also shows another type of results flow, the node Stat-Clas-

Friedman which represents the statistical comparison, results are collected and a

statistical analysis over multiple data sets is performed by following the indications

given in [ 38 ].

Once the graph is defined, we can set up the associated experiment and save it as a

zip file for an off-line run. Thus, the experiment is set up as a set of XML scripts and

a JAR program for running it. Within the results directory, there will be directories

used for housing the results of each method during the run. For example, the files

allocated in the directory associated to an interval learning algorithmwill contain the

knowledge or rule base. In the case of a visualization procedure, its directory will

house the results files. The results obtained by the analyzed methods are shown in

the next section, together with the statistical analysis.

Next Page

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home