Information Technology Reference
In-Depth Information
The simplest of these (data identification) is to view the identity or details
of items in the feature, or export this information to a file for later use.
Another option is to re-visualize the data set without the selected data or
to focus in and only visualize the selected data. This can be used to exclude
distorting outliers, or to concentrate on the interactions within an area of in-
terest. Of course, we can data mine the whole data set without doing this, the
approach taken by many other systems. One of the features of the Haiku system
is the interactive indication of the things that we are currently interested in, and
the subsequent focusing of the knowledge discovery process on categorizing or
distinguishing that data.
A key feature of the system is that the user selection process takes full ad-
vantage of the abilities of our visual system: humans are exceptionally good at
picking up gross features of visual representations. Our abilities have evolved
to work well in the presence of noise, of missing or obscured data, and we are
able to pick out both simple lines and curves as well as more complex features
such as spirals and undulating waves or planes. By allowing user input into the
knowledge discovery process, we can effectively use a highly ecient system very
quickly as well as reducing the work that the computational system has to do.
The third option asks the machine to process the selected data. This is the
most striking feature of the system: its ability to “explain” why features of
interest exist. Typical questions when looking at a visual representation of data
are: “Why are these items out on their own?”, “What are the characteristics of
this cluster?”, “How do these two groups of items differ?”. Applying a machine
learning component generates answers to these types of question.
The interaction works as follows: first, a group or number of groups is selected.
Then the option to explain the groups is selected. The user answers a small
number of questions about their preferences for the explanation (short/long;
highly accurate/general characteristics etc.) The system then returns a set of
rules describing the features selected, and ensures that the rules conform to the
level of detail that the user requires.
As an alternative, the classic machine learning system C4.5 [4] may be used
to generate classification rules. Other data mining systems may also be applied
by saving the selected feature information to an external file.
2.4
Knowledge Visualization and Feedback
The results from the GA can be fed back into the visualization to give extra
insight into their relationships with the data. Identified clusters can be colored,
for example, or rules added and linked to the data that they classify, as in Fig. 2.
In this figure, classification rules are the large spheres, with the data being
the smaller spheres. Both are colored by class. Rules form part of an ordered rule
set. If the first matching rule in the rule set correctly classifies an item of data,
they are linked with a white line. If the rule set classification is incorrect, the
rule and data are linked with a red (dark grey) line. Cyan (or light grey) links
are between the rules and the data that are covered by rules further down the
rule set. The visualization reorganizes itself to show these relationships clearly.
Search WWH ::




Custom Search