Information Technology Reference
In-Depth Information
P
(
B
2 |
A
2 )=500/2 000=1/4
P
3 )=500/1 000=1/2
Question: If a product belongs to
(
B
2 |
A
B
2 , how much is the probability that it comes
from
3 ?
Solution: Calculate Joint Probabilities:
A
1 ,
A
2 , or
A
P
(
A
1 )
P
(
B
2 |
A
1 )=(1/2)(1/3)=1/6
P
(
A
2 )
P
(
B
2 |
A
2 )=(1/3)(1/4)=1/12
3 )=(1/6)(1/2)=1/12
Calculate Total Probability P(B2):
P
(
A
3 )
P
(
B
2 |
A
A i )
=(1/2)(1/3)+(1/3)(1/4)+(1/6)(1/2)=1/3
Calculate posterior probability according to Bayesian Formula:
P
(
B
2 )=
P
(
A i )
P
(
B
2
P
(
A
1 |
B
2 )=(1/6)÷(1/3)=1/2
P
(
A
2 |
B
2 )=(1/12)÷(1/3)=1/4
P
(
A
3 |
B
2 )=(1/12)÷(1/3)=1/4
6.3 Bayesian Problem Solving
Bayesian learning theory utilizes prior information and sample data to estimate
unknown data. Probabilities (joint probabilities and conditional probabilities) are
the representation of prior information and sample data in Bayesian learning
theory. How to get the estimation of these probabilities (also called probabilistic
density estimation) is much controversy in Bayesian learning theory. Bayesian
density estimation focuses on how to gain the estimation of distribution of
unknown variables (vectors) and its parameters based on sample data and prior
knowledge from human experts. It includes two steps. One is to determine prior
distributions of unknown variables; the other is to get the parameters of these
distributions. If we know nothing about previous information, the distribution is
called non-informative prior distribution. If we know the distribution and seek its
proper parameters, the distribution is called informative prior distribution.
Because learning from data is the most elementary characteristic of data mining,
non-informative prior distribution is the main subject of Bayesian learning theory
research.
The first step of Bayesian problem solving is to select Bayesian prior
distribution. This is a key step. There are two common methods to select prior
distribution, namely subjective method and objective method. The former makes
use of human experience and expert knowledge to assign prior distribution. The
latter is analyzing characters of data to get statistical features of data. It requires
sufficient data to get the true distribution of data. In practice, these two methods
Search WWH ::




Custom Search