Information Technology Reference
In-Depth Information
2 Approach
2.1
Visualization
The visualization engine used in the Haiku system provides an abstract 3-d
perspective of multi-dimensional data based on the Hyper system[7-9] for force
based visualization. The visualization consists of nodes and links (similar to a
ball-and-stick model, only dynamic), whose properties are given by the param-
eters of the data. Data elements affect parameters such as node size, mass, link
strength and elasticity, and so on. Multiple elements can affect one parameter,
or a subset of parameters can be chosen.
Many forms of data can be visualized in Haiku. Typical data for data mining
consists of a number of individual “items” (representing, for example, customers)
each with the same number of numerical and/or nominal attributes. This is
similar to standard dimension reduction methods used for solely numerical data
such as Projection Pursuit [5] and Multi Dimensional Scaling [6], but applicable
to data with a mix of nominal and numeric fields. What is required for Haiku
visualization is that a similarity can be calculated between any two items. The
similarity metric should match an intuitive view of the similarity of two items.
In most cases, a simple and standard distance measure performs well.
To create the visualization, nodes are initially scattered randomly into the
3d space, with their associated links. Movement in this space is determined by
a set of rules similar to the laws of physics. Links want to assume a particular
length, determined by their elasticity and the nodes they are connected to - and
the parameters of these are affected by the data. They pull inwards until they
reach that length, or push outwards if they are compressed, just as a spring does
in the real world. The more similar an item, the stronger the link connecting
them. Nodes repel each other, based on their mass. This whole approach can be
seen as a force directed graph visualization. This initial state is then allowed to
evolve, and the links and nodes shue themselves around until they reach a low
energy, steady state. Since the strongest forces are between the items that are
the most similar, they tend to cluster in the same area of space - items unrelated
to each other are not connected and hence occupy different areas of space. The
repulsive force between nodes is used to spread them out.
The physics of the space are adjustable, but are chosen so that a steady
state solution can be reached that is static - this is unlike the real world, in
which a steady state exists that involves motion, such as we see in planetary
orbits. Having a static steady state is easier to explore, since the visual system
is especially sensitive to motion and hence orbiting systems tend to become the
focus of attention.
The system effectively reduces the data dimensionality to 3D. However, un-
like traditional dimension reduction methods, there is no predefined mapping
between the higher and lower dimensional spaces.
Computationally, the process scales exponentially with the number of links,
which is usually proportional to the number of data points. For small data sets
(up to 1000 nodes) the process can be allowed to run in real time. For larger data
Search WWH ::




Custom Search