Information Technology Reference
In-Depth Information
3 Case Studies
3.1
Case Study 1: Interactive Data Mining of Housing Data
The Boston Housing Data [3] is a classic, well-known dataset available from the
UCI Machine Learning repository [2]. Haiku was used to visualize the data. The
complex clustering shown in Fig. 3 was revealed.
Two fairly distinct groups of data are visible, which show smaller internal
features such as sub-groups. The two main groups were selected using the mouse,
and short, accurate, classification rules were requested from the data mining sys-
tem. These rules are shown:
Bounds river = true
⇒
GROUP 1
Accuracy: 100% Coverage:43%
PropLargeDevelop = 0.0 AND 9.9
≤
older properties percent
≤
100.0 AND Pupil teacher ratio = 20.2
⇒
GROUP 1
Accuracy: 94% Coverage: 83%
Bounds river = false AND 4
≤
Highway access
≤
8
⇒
GROUP 2
Accuracy: 100% Coverage: 77%
Bounds river = false AND 264
≤
Tax rate
≤
403
⇒
GROUP 2
Accuracy: 100% Coverage:69%
2.02
<
Industry proportion
≤
3.41
⇒
GROUP 2
Accuracy:98% Coverage: 13%
Fig. 3.
Clustering of Boston Housing Data.
Search WWH ::
Custom Search