Science at the Speed of Thought - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

relevant and discard the rest. One recent winner of the Knowledge Discovery in

Databases competition reduced the number of features from the original 139,351

down to 4 features in his final model [37]. An alternate approach is summarizing.

For example, data can be clustered and the individual components can be studied

in various ways.

Machine learning tools can help with both dimension reduction and sum-

marizing. We use the Weka [38] implementations of many machine learning

algorithms. We also use the autoclass [39] clustering software. In addition we

have developed our own genetic programming software package, GPP [40]. We

also have our own equation discovery software [41]. This provides us with many

avenues for displaying, interacting, and gaining insight into our results.

Our visualization of the Iris data set [42] contains multiple representations.

Figure 10-a shows part of our visualization. On the near side of the left wall

is a parallel coordinate plot [36] of the cluster identified with the transparent

envelope. On the far side of the left wall is a plot of the probability density

distribution of each of the attributes in the data set. The right wall shows how

the attributes rank with Information Gain [38]. In the foreground is a set of

statistics that have been computed on the fly in response to a user command.

The points of the data set are represented as glyphs where the attributes have

been mapped to glyph attributes using our glyph toolbox. The points are plotted

in the central cube. A user can also interact with this visualization by turning

the transparent envelops of the clusters on and off individually, and the parallel

coordinate plots with them. Figure 10-b shows the same dataset visualized in

three different ways, shown in three separate rooms.

Figure 10 helps to bring together all of the main components of our VL.

The visualization is run through a distributed computing environment, in which

multiple users can interact with the data. The figure demonstrates the interactive

IVE, in which users can move, hide, and select objects in the system to control the

display the data and the movement of the data into and out of the visualization.

Figure 10 also displays results of our machine learning tools, used to analyze the

data and select which components to study. With all three of these components,

we can speed up concept development.

3 Applications

We speed up insight into our data through representation of the data in the

IVE and through interactions with the data. One representation may not be

sucient, so the ability to switch between and interact with representations is

important. We describe a set of applications that highlight our approach.

3.1

Multi-modal Imaging and Visualization

In this project we are developing methods for combining related three-dimens-

ional data sets from a variety of sources into visualizations that enable explo-

ration and understanding of the data at a variety of scales and with a variety of

Search WWH ::

Custom Search

Home