Information Technology Reference
In-Depth Information
Fastmap [32] can be used to reduce N-dimensional data sets to two or three
dimensions for visualization. As the algorithm attempts to preserve distances
in the reduced space, visual identification of clusters can be made. FastMap's
advantage over other methods of transforming multidimensional data into a 3D
space is its speed - O(Nk) for N objects projected into a k dimensional space.
One disadvantage is that, compared with Multi Dimensional Scaling (MDS), it
fares less well in preserving the distance between objects when they are projected
into a lower dimensional space. Another disadvantage is that FastMap uses a line
between two of the further apart objects (“pivot objects”) to provide an axis for
the first reduced dimension. Other axes are then placed perpendicular to this.
If the pivot objects chosen are atypical (outliers), then this initial axis could be
unsuitable.
Despite these possible problems, FastMap gives good visualizations on both
synthetic and real world data, with clusters and patterns in the data clearly
visible. Its scalability means that it is suitable for interactive use with large
amounts of data.
Network Visualization. Netmap [33] is a successful commercial data mining
visualization package. Typical applications include fraud identification, criminal
investigation, and social network analysis.
Nodes representing entities in the database are placed around the perimeter
of a circle. Links representing relationships between entities are drawn across
the circle. Nodes can be grouped, and then appear next to each other. Nodes
and links can be labeled and pruned according to their attributes. Nodes with
few links can be removed reducing clutter.
4.3
Discovery Visualization
Discovery visualization is important; as data mining systems grow more pow-
erful, we risk entering the era of “discovery overload” while trying to solve our
“data overload” problems. There has been comparatively little work on the visu-
alization of discoveries, and much work tends to be limited to specialized forms
of rules, or unable to cope with modest numbers of discoveries. Below we briefly
outline some of the approaches that have been tried, and identify some of their
strengths and weaknesses. Another useful review of visualization for clustering
and association discoveries can be found in Celar et al. [41].
Two Antecedent Association Rules. Fukuda et al. [34] present a system for
visualization of individual rules. However, it is limited to rules with 2 numeric
conditions, and a single boolean conclusion. For example:
18 < Age < 25 and 10,000 Income < 25000 Implies Bad risk = true
Rules are visualized on a plane, with one axis representing each numerical
attribute. The plane is divided into a number of pixels by partitioning the values
into fixed size buckets. Pixel color represents rule confidence with redder pix-
els representing more confident rules. Pixel brightness approximates support. A
good rule appears as a bright red region of the plane. Regions meeting a thresh-
Search WWH ::




Custom Search