Interactive Comprehensible Data Mining - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

The simplest of these (data identification) is to view the identity or details

of items in the feature, or export this information to a file for later use.

Another option is to re-visualize the data set without the selected data or

to focus in and only visualize the selected data. This can be used to exclude

distorting outliers, or to concentrate on the interactions within an area of in-

terest. Of course, we can data mine the whole data set without doing this, the

approach taken by many other systems. One of the features of the Haiku system

is the interactive indication of the things that we are currently interested in, and

the subsequent focusing of the knowledge discovery process on categorizing or

distinguishing that data.

A key feature of the system is that the user selection process takes full ad-

vantage of the abilities of our visual system: humans are exceptionally good at

picking up gross features of visual representations. Our abilities have evolved

to work well in the presence of noise, of missing or obscured data, and we are

able to pick out both simple lines and curves as well as more complex features

such as spirals and undulating waves or planes. By allowing user input into the

knowledge discovery process, we can effectively use a highly ecient system very

quickly as well as reducing the work that the computational system has to do.

The third option asks the machine to process the selected data. This is the

most striking feature of the system: its ability to “explain” why features of

interest exist. Typical questions when looking at a visual representation of data

are: “Why are these items out on their own?”, “What are the characteristics of

this cluster?”, “How do these two groups of items differ?”. Applying a machine

learning component generates answers to these types of question.

The interaction works as follows: first, a group or number of groups is selected.

Then the option to explain the groups is selected. The user answers a small

number of questions about their preferences for the explanation (short/long;

highly accurate/general characteristics etc.) The system then returns a set of

rules describing the features selected, and ensures that the rules conform to the

level of detail that the user requires.

As an alternative, the classic machine learning system C4.5 [4] may be used

to generate classification rules. Other data mining systems may also be applied

by saving the selected feature information to an external file.

2.4

Knowledge Visualization and Feedback

The results from the GA can be fed back into the visualization to give extra

insight into their relationships with the data. Identified clusters can be colored,

for example, or rules added and linked to the data that they classify, as in Fig. 2.

In this figure, classification rules are the large spheres, with the data being

the smaller spheres. Both are colored by class. Rules form part of an ordered rule

set. If the first matching rule in the rule set correctly classifies an item of data,

they are linked with a white line. If the rule set classification is incorrect, the

rule and data are linked with a red (dark grey) line. Cyan (or light grey) links

are between the rules and the data that are covered by rules further down the

rule set. The visualization reorganizes itself to show these relationships clearly.

Ambient Intelligence for Scientific Discovery

Search WWH ::

Custom Search

Home