Information Technology Reference
In-Depth Information
naïve Bayesian classifier, the time complexity of each round is O(
nf
). T round
training corresponds to O(
Tnf
). Notice that
T
is a constant. So the entire time
complexity is still O(
).
For naïve Bayesian classifier, the primary computation is counting. Training
examples can be processed either sequentially or in batches from disk or tape. So
this method is perfectly suit for knowledge discovery on large data set. Training
set is not necessarily loaded to memory entirely, and part of it can be kept in the
disks or tapes. Yet the boosting naïve Bayesian model also has the following
problems.
nf
(1) From the idea of boosting, when noise exists in training set, boosting method
will take it as useful information and amplify its effect with large weight. This
will reduce the performance of boosting. If there are many noise data, boosting
will lead to worse result.
(2) Although theoretically boosting can achieve 0 error rate for training set, in its
practical application of naïve Bayesian model, 0 classification error in training
set is generally hardly guaranteed.
6.5 Construction of Bayesian Network
6.5.1 Structure of Bayesian network and its construction
In short, Bayesian network is a directed acyclic graph with probabilistic notes.
The graphic model can be utilized to represent the (physical or Bayesian) joint
distribution of large variable set. It can also be used to analyze correlations
among numerous variables. With the capability of learning and statistical
reasoning under Bayesian theorem, it can fulfill many data mining tasks, such as
prediction, classification, clustering, casual analysis and so on.
Given a series of variables X ={
n }, a Bayesian network is
composed of two components: one is network structure S, which represents the
conditional independence among variables X ; the other is local distribution set
x
1 ,
x
2 , … ,
x
,
which is related to every variable. The two components define the joint
probability of X . S is a directed acyclic graph. Nodes in S and variables in X are
one to one correspondingly. Let
P
x i be a variable or node and Pa i be the parent
nodes of
i in S. The absence of arc between nodes usually represents conditional
independence. The joint probability of X is represented as:
x
n
=
p x
(
|
p
( X ) =
Pa i )
(6.32)
i
i
1
Search WWH ::




Custom Search