Databases Reference
In-Depth Information
buys_computer
buys_computer
age
yes
no
income
low
medium
high
yes
no
youth
middle_aged
senior
2
3
3
1
4
0
4
2
3
2
2
2
buys_computer
buys_computer
student
yes
no
credit_ratting
yes
no
yes
6
1
fair
6
2
no
3
4
excellent
3
3
Figure 8.8 The use of data structures to hold aggregate information regarding the training data (e.g.,
these AVC-sets describing Table 8.1's data) are one approach to improving the scalability of
decision tree induction.
BOAT can use any attribute selection measure that selects binary splits and that is
based on the notion of purity of partitions such as the Gini index. BOAT uses a lower
bound on the attribute selection measure to detect if this “very good” tree, T 0 , is different
from the “real” tree, T , that would have been generated using all of the data. It refines
T 0 to arrive at T .
BOAT usually requires only two scans of D . This is quite an improvement, even
in comparison to traditional decision tree algorithms (e.g., the basic algorithm in
Figure 8.3), which require one scan per tree level! BOAT was found to be two to three
times faster than RainForest, while constructing exactly the same tree. An additional
advantage of BOAT is that it can be used for incremental updates. That is, BOAT can
take new insertions and deletions for the training data and update the decision tree to
reflect these changes, without having to reconstruct the tree from scratch.
8.2.5 Visual Mining for Decision Tree Induction
“Are there any interactive approaches to decision tree induction that allow us to visual-
ize the data and the tree as it is being constructed? Can we use any knowledge of our
data to help in building the tree?” In this section, you will learn about an approach to
decision tree induction that supports these options. Perception-based classification
(PBC) is an interactive approach based on multidimensional visualization techniques
and allows the user to incorporate background knowledge about the data when building
a decision tree. By visually interacting with the data, the user is also likely to develop a
deeper understanding of the data. The resulting trees tend to be smaller than those built
using traditional decision tree induction methods and so are easier to interpret, while
achieving about the same accuracy.
“How can the data be visualized to support interactive decision tree construction?”
PBC uses a pixel-oriented approach to view multidimensional data with its class label
 
Search WWH ::




Custom Search