Interactive Comprehensible Data Mining - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

3 Case Studies

3.1

Case Study 1: Interactive Data Mining of Housing Data

The Boston Housing Data [3] is a classic, well-known dataset available from the

UCI Machine Learning repository [2]. Haiku was used to visualize the data. The

complex clustering shown in Fig. 3 was revealed.

Two fairly distinct groups of data are visible, which show smaller internal

features such as sub-groups. The two main groups were selected using the mouse,

and short, accurate, classification rules were requested from the data mining sys-

tem. These rules are shown:

Bounds river = true ⇒ GROUP 1

Accuracy: 100% Coverage:43%

PropLargeDevelop = 0.0 AND 9.9 ≤ older properties percent

≤ 100.0 AND Pupil teacher ratio = 20.2 ⇒ GROUP 1

Accuracy: 94% Coverage: 83%

Bounds river = false AND 4 ≤ Highway access ≤ 8 ⇒ GROUP 2

Accuracy: 100% Coverage: 77%

Bounds river = false AND 264 ≤ Tax rate ≤ 403 ⇒ GROUP 2

Accuracy: 100% Coverage:69%

2.02 < Industry proportion ≤ 3.41 ⇒ GROUP 2

Accuracy:98% Coverage: 13%

Fig. 3. Clustering of Boston Housing Data.

Search WWH ::

Custom Search

Home