Interactive Comprehensible Data Mining - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

Fastmap [32] can be used to reduce N-dimensional data sets to two or three

dimensions for visualization. As the algorithm attempts to preserve distances

in the reduced space, visual identification of clusters can be made. FastMap's

advantage over other methods of transforming multidimensional data into a 3D

space is its speed - O(Nk) for N objects projected into a k dimensional space.

One disadvantage is that, compared with Multi Dimensional Scaling (MDS), it

fares less well in preserving the distance between objects when they are projected

into a lower dimensional space. Another disadvantage is that FastMap uses a line

between two of the further apart objects (“pivot objects”) to provide an axis for

the first reduced dimension. Other axes are then placed perpendicular to this.

If the pivot objects chosen are atypical (outliers), then this initial axis could be

unsuitable.

Despite these possible problems, FastMap gives good visualizations on both

synthetic and real world data, with clusters and patterns in the data clearly

visible. Its scalability means that it is suitable for interactive use with large

amounts of data.

Network Visualization. Netmap [33] is a successful commercial data mining

visualization package. Typical applications include fraud identification, criminal

investigation, and social network analysis.

Nodes representing entities in the database are placed around the perimeter

of a circle. Links representing relationships between entities are drawn across

the circle. Nodes can be grouped, and then appear next to each other. Nodes

and links can be labeled and pruned according to their attributes. Nodes with

few links can be removed reducing clutter.

4.3

Discovery Visualization

Discovery visualization is important; as data mining systems grow more pow-

erful, we risk entering the era of “discovery overload” while trying to solve our

“data overload” problems. There has been comparatively little work on the visu-

alization of discoveries, and much work tends to be limited to specialized forms

of rules, or unable to cope with modest numbers of discoveries. Below we briefly

outline some of the approaches that have been tried, and identify some of their

strengths and weaknesses. Another useful review of visualization for clustering

and association discoveries can be found in Celar et al. [41].

Two Antecedent Association Rules. Fukuda et al. [34] present a system for

visualization of individual rules. However, it is limited to rules with 2 numeric

conditions, and a single boolean conclusion. For example:

18 < Age < 25 and 10,000 ≤ Income < 25000 Implies Bad risk = true

Rules are visualized on a plane, with one axis representing each numerical

attribute. The plane is divided into a number of pixels by partitioning the values

into fixed size buckets. Pixel color represents rule confidence with redder pix-

els representing more confident rules. Pixel brightness approximates support. A

good rule appears as a bright red region of the plane. Regions meeting a thresh-

Ambient Intelligence for Scientific Discovery

Search WWH ::

Custom Search

Home