Databases Reference
In-Depth Information
8.6 Why is naıve Bayesian classification called “naıve”? Briefly outline the major ideas of
naıve Bayesian classification.
8.7 The following table consists of training data from an employee database. The data have
been generalized. For example, “31
35” for age represents the age range of 31 to 35.
For a given row entry, count represents the number of data tuples having the values for
department, status, age , and salary given in that row.
:::
department
status
age
salary
count
sales
senior
31:::35
46K:::50K
30
sales
junior
26
:::
30
26K
:::
30K
40
sales
junior
31
:::
35
31K
:::
35K
40
systems
junior
21:::25
46K:::50K
20
systems
senior
31:::35
66K:::70K
5
systems
junior
26
:::
30
46K
:::
50K
3
systems
senior
41
:::
45
66K
:::
70K
3
marketing
senior
36:::40
46K:::50K
10
marketing
junior
31:::35
41K:::45K
4
secretary
senior
46
:::
50
36K
:::
40K
4
secretary
junior
26
:::
30
26K
:::
30K
6
Let status be the class label attribute.
(a) How would you modify the basic decision tree algorithm to take into consideration
the count of each generalized data tuple (i.e., of each row entry)?
(b) Use your algorithm to construct a decision tree from the given data.
(c) Given a data tuple having the values “systems,” “26 . . . 30,” and “46-50K” for the
attributes department, age , and salary , respectively, what would a naıve Bayesian
classification of the status for the tuple be?
8.8 RainForest is a scalable algorithm for decision tree induction. Develop a scalable naıve
Bayesian classification algorithm that requires just a single scan of the entire data set
for most databases. Discuss whether such an algorithm can be refined to incorporate
boosting to further enhance its classification accuracy.
8.9 Design an efficient method that performs effective naıve Bayesian classification over
an infinite data stream (i.e., you can scan the data stream only once). If we wanted
to discover the evolution of such classification schemes (e.g., comparing the classifica-
tion scheme at this moment with earlier schemes such as one from a week ago), what
modified design would you suggest?
8.10 Show that accuracy is a function of sensitivity and specificity , that is, prove Eq. (8.25).
8.11 The harmonic mean is one of several kinds of averages. Chapter 2 discussed how to
compute the arithmetic mean , which is what most people typically think of when they
compute an average. The harmonic mean , H , of the positive real numbers, x 1 , x 2 ,
:::
, x n ,
 
Search WWH ::




Custom Search