Databases Reference
In-Depth Information
8.6
Why is
naıve Bayesian classification
called “naıve”? Briefly outline the major ideas of
naıve Bayesian classification.
8.7
The following table consists of training data from an employee database. The data have
been generalized. For example, “31
35” for
age
represents the age range of 31 to 35.
For a given row entry,
count
represents the number of data tuples having the values for
department, status, age
, and
salary
given in that row.
:::
department
status
age
salary
count
sales
senior
31:::35
46K:::50K
30
sales
junior
26
:::
30
26K
:::
30K
40
sales
junior
31
:::
35
31K
:::
35K
40
systems
junior
21:::25
46K:::50K
20
systems
senior
31:::35
66K:::70K
5
systems
junior
26
:::
30
46K
:::
50K
3
systems
senior
41
:::
45
66K
:::
70K
3
marketing
senior
36:::40
46K:::50K
10
marketing
junior
31:::35
41K:::45K
4
secretary
senior
46
:::
50
36K
:::
40K
4
secretary
junior
26
:::
30
26K
:::
30K
6
Let
status
be the class label attribute.
(a) How would you modify the basic decision tree algorithm to take into consideration
the
count
of each generalized data tuple (i.e., of each row entry)?
(b) Use your algorithm to construct a decision tree from the given data.
(c) Given a data tuple having the values
“systems,” “26 . . . 30,”
and
“46-50K”
for the
attributes
department, age
, and
salary
, respectively, what would a naıve Bayesian
classification of the
status
for the tuple be?
8.8
RainForest is a scalable algorithm for decision tree induction. Develop a scalable naıve
Bayesian classification algorithm that requires just a single scan of the entire data set
for most databases. Discuss whether such an algorithm can be refined to incorporate
boosting
to further enhance its classification accuracy.
8.9
Design an efficient method that performs effective naıve Bayesian classification over
an
infinite
data stream (i.e., you can scan the data stream only once). If we wanted
to discover the
evolution
of such classification schemes (e.g., comparing the classifica-
tion scheme at this moment with earlier schemes such as one from a week ago), what
modified design would you suggest?
8.10
Show that accuracy is a function of
sensitivity
and
specificity
, that is, prove Eq. (8.25).
8.11
The harmonic mean is one of several kinds of averages. Chapter 2 discussed how to
compute the
arithmetic mean
, which is what most people typically think of when they
compute an average. The
harmonic mean
,
H
, of the positive real numbers,
x
1
,
x
2
,
:::
,
x
n
,