Interactive Comprehensible Data Mining - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

Object Mapping. Object mapping embeds information about some or all of

the dimensions in the appearance of an object representing an item of data.

Most of this work takes an iconic approach, for example, Chernoff faces [17] map

different dimensions to the appearance of eyes, ears, mouth etc. on a computer-

generated face. Another technique, parallel co-ordinates [18-21, 38], uses the

crossing points of a line on parallel axes to represent an object in dimensional

space. These lines can be thought of as dynamically generated icons.

Pixel Based Methods. Pixel based methods achieve a high information den-

sity by using individual pixels to represent fields in data items, or even items as

a whole [44, 22]. One major problem to be addressed is how to choose a unique

pixel for each item. Keim [44, 22] solves this problem in two parts. First, he im-

poses an order on the tuples. Secondly, he makes use of space-filling curves to

transform one-dimensional ordering into two dimensions.

Spenke and Beilken [23] detail the use of the interactive visualization based

data mining tool InfoZoom to the large financial data set used for the PKDD'99

Discovery Challenge. InfoZoom uses a spreadsheet-like approach to interaction.

When too many values are present to be shown numerically, each value is re-

placed by a single pixel, giving a scatter plot like view of the data. The visible

data can be constrained to subsets with particular properties, or sorted by an

attribute to reveal correlations. When a subset is selected, animation is used to

show a gradual change between the values for the whole data set and the values

for the subset. Formulas can also be added to the “spreadsheet”, for example to

calculate averages. InfoZoom can be used either in an exploratory or hypothesis

driven manner and seems simple enough to be used by domain experts after

training. The system appears best suited to smaller data sets, the example in

[23] uses around 700 items.

Dimension Reduction. Dimension reduction methods are normally applied

to purely numeric data. They map the original dimensions (fields) of data into a

smaller number of numeric dimensions, while attempting to retain or illustrate

relevant properties of the original space, such as distances between items or

patterns in item distribution.

Projection maps positions in N dimensions into 2 (or 3 or N) dimensions in a

similar way to a 3D object casting a shadow on a 2D surface. As the 3D object is

rotated, its shadow changes, and different features of the object become visible.

The grand tour [26, 27] is an automatically generated sequence of projections

from N-dimensions that show the data from almost every possible angle. How-

ever, this can be time consuming and many of the views will not be of interest.

Projection pursuit [5] automates the identification of interesting views. A

heuristic measure, the “projection index”, is optimized to find an “interesting”

view, typically by simulated annealing. Different projection indices may, for ex-

ample, maximize the clustering of points or spread points over the 2D space. A

number of variants on the original projection pursuit algorithm are discussed in

[25]. XGobi [28, 30] and GGobi [31, 30] are the most commonly available imple-

mentations of the grand tour and projection pursuit.

Ambient Intelligence for Scientific Discovery

Search WWH ::

Custom Search

Home