Interactive Comprehensible Data Mining - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

Interactive Comprehensible Data Mining

Andy Pryke and Russell Beale

University of Birmingham, United Kingdom

{ A.N.Pryke,R.Beale } @cs.bham.ac.uk

1Probem

1.1

What Is Interesting?

In data mining, or knowledge discovery, we are essentially faced with a mass

of data that we are trying to make sense of. We are looking for something

“interesting”. Quite what “interesting” means is hard to define, however - one

day it is the general trend that most of the data follows that we are intrigued

by - the next it is why there are a few outliers to that trend. In order for a data

mining to be generically useful to us, it must therefore have some way in which

we can indicate what is interesting and what is not, and for that to be dynamic

and changeable.

Once we can ask the question appropriately, we then need to be able to

understand the answers that the system gives us. It is therefore important that

the responses of the system are represented in ways that we can understand.

Whilst complex statistical measures of the data set may be accurate, if they are

not comprehensible to the users they do not offer insight, only description.

One concept that we consider to be vital is to recognize the relative strengths

of users and computers. The human visual system is exceptionally good at clus-

tering, at recognizing patterns and trends, even in the presence of noise and

distortion. Computer systems are exceptionally good at crunching numbers, pro-

ducing exact parameterizations and exploring large numbers of alternatives. If

we can combine the best of human and computer processing, we should be able

to develop systems that are superior to one or other approach alone.

An ideal data mining system should, we would argue, offer the above char-

acteristics; the ability to define what is interesting, using the abilities of the

user and the computer in tasks to which they are best suited, and providing

explanations of the data that are understandable and provide deep insights.

This leads us towards a system that will be interactive, in order to be flexible

and work towards a solution. It should use visualization techniques to offer the

user the opportunity to do both perceptual clustering and trend analysis, and

to offer a mechanism for feeding back the results of machine-based data mining.

It should have a data mining engine that is powerful, effective, and which can

produce humanly comprehensible results as well.

The Haiku system was developed with these principles in mind, and offers a

sym-biotic system that couples interactive 3-d dynamic visualization technology

with a novel genetic algorithm.

Search WWH ::

Custom Search

Home