Information Technology Reference
In-Depth Information
With the concept of conditional probability, we can introduce another concept,
named independence. Here independence means that the distribution of a random
variable does not change when given the value of another random variable. In ma-
chine learning, we often make such assumptions. For example, we say that the data
samples are independently and identically distributed.
According to the above definition, we can clearly see that if two random variables
X and Y are independent, then
P(X,Y)
=
P(X)P(Y).
Sometimes, we will also use conditional independence, which means that if we
know the value of a random variable, then some other random variables will become
independent of each other. That is,
P(X,Y | Z) = P(X | Z)P(Y | Z).
There are two basic rules in probability theory, which are widely used in various
settings. The first is the Chain Rule. It can be represented as follows:
P(X 1 ,X 2 ,...,X n )
=
P(X 1 )P (X 2 |
X 1 )
···
P(X n |
X 1 ,X 2 ,...,X n 1 ).
The chain rule provides a way of calculating the joint probability of some random
variables, which is especially useful when there is independence across some of the
variables.
The second rule is the Bayes Rule, which allows us to compute the conditional
probability P(X
X) . Intuitively, it ac-
tually inverses the cause and result. The Bayes Rule takes the following form:
|
Y) from another conditional probability P(Y
|
P(Y
X)P (X)
P(Y)
|
P(Y
|
X)P (X)
a P(Y
P(X
|
Y)
=
=
a) .
|
X
=
a)P(X
=
21.1.2.2 Continuous Distribution
Now we generalize the above discussions to the continuous case. This time, we need
to introduce the concept of the probability density function (PDF). A probability
density function, p , is a non-negative, integrable function such that
p(x)dx
=
1 .
The probability of a random variable distributed according to a PDF f is com-
puted as follows:
b
P(a
x
b)
=
p(x)dx.
a
Search WWH ::




Custom Search