Information Technology Reference
In-Depth Information
A number of things are immediately more apparent from this visualization
than from a textual description. On the left of the figure is a group of five green
(mid grey) rules, floating between them is the data they classify correctly. The
rule at the top of the group can be seen to also classify some blue (dark grey)
data incorrectly. Most interesting in the group is the right most rule, which not
only “wins” the competition to classify the data correctly, it also misclassifies a
number of other data points (red/dark grey links to right and below). This rule
is obviously too general. From the visualization we can see that removing this
rule would reduce the number of incorrect classifications, without affecting the
number of items correctly classified.
It is interesting to note that as this visualization depends solely on the rela-
tionship between knowledge (e.g. classification rule) and data, it can be applied
to a very wide range of discoveries, including those made by non-symbolic sys-
tems such as neural networks.
The system is fully interactive. The user can identify different characteristics
and instruct the GA to describe them, and so the process continues. This synergy
of abilities between the rapid, parallel exploration of the structure space by
the computer and the user's innate pattern recognition abilities and interest in
different aspects of the data produces a very powerful and flexible system.
2.5
Genetic Algorithms for Data Mining
We use a genetic algorithm (GA) approach for a number of reasons. First, a
GA is able to effectively explore a large search space, and modern computing
power means we can take advantage of this within a reasonable time frame.
Secondly, one of the key design features is to produce a system that has humanly
Fig. 2. Rule Coverage and Accuracy. Dark links indicate incorrect classification.
Search WWH ::




Custom Search