Information Technology Reference
In-Depth Information
Interactive Comprehensible Data Mining
Andy Pryke and Russell Beale
University of Birmingham, United Kingdom
{ A.N.Pryke,R.Beale } @cs.bham.ac.uk
1Probem
1.1
What Is Interesting?
In data mining, or knowledge discovery, we are essentially faced with a mass
of data that we are trying to make sense of. We are looking for something
“interesting”. Quite what “interesting” means is hard to define, however - one
day it is the general trend that most of the data follows that we are intrigued
by - the next it is why there are a few outliers to that trend. In order for a data
mining to be generically useful to us, it must therefore have some way in which
we can indicate what is interesting and what is not, and for that to be dynamic
and changeable.
Once we can ask the question appropriately, we then need to be able to
understand the answers that the system gives us. It is therefore important that
the responses of the system are represented in ways that we can understand.
Whilst complex statistical measures of the data set may be accurate, if they are
not comprehensible to the users they do not offer insight, only description.
One concept that we consider to be vital is to recognize the relative strengths
of users and computers. The human visual system is exceptionally good at clus-
tering, at recognizing patterns and trends, even in the presence of noise and
distortion. Computer systems are exceptionally good at crunching numbers, pro-
ducing exact parameterizations and exploring large numbers of alternatives. If
we can combine the best of human and computer processing, we should be able
to develop systems that are superior to one or other approach alone.
An ideal data mining system should, we would argue, offer the above char-
acteristics; the ability to define what is interesting, using the abilities of the
user and the computer in tasks to which they are best suited, and providing
explanations of the data that are understandable and provide deep insights.
This leads us towards a system that will be interactive, in order to be flexible
and work towards a solution. It should use visualization techniques to offer the
user the opportunity to do both perceptual clustering and trend analysis, and
to offer a mechanism for feeding back the results of machine-based data mining.
It should have a data mining engine that is powerful, effective, and which can
produce humanly comprehensible results as well.
The Haiku system was developed with these principles in mind, and offers a
sym-biotic system that couples interactive 3-d dynamic visualization technology
with a novel genetic algorithm.
Search WWH ::




Custom Search