Mathematical Background - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

With the concept of conditional probability, we can introduce another concept,

named independence. Here independence means that the distribution of a random

variable does not change when given the value of another random variable. In ma-

chine learning, we often make such assumptions. For example, we say that the data

samples are independently and identically distributed.

According to the above definition, we can clearly see that if two random variables

X and Y are independent, then

P(X,Y)

=

P(X)P(Y).

Sometimes, we will also use conditional independence, which means that if we

know the value of a random variable, then some other random variables will become

independent of each other. That is,

P(X,Y | Z) = P(X | Z)P(Y | Z).

There are two basic rules in probability theory, which are widely used in various

settings. The first is the Chain Rule. It can be represented as follows:

P(X 1 ,X 2 ,...,X n )

=

P(X 1 )P (X 2 |

X 1 )

···

P(X n |

X 1 ,X 2 ,...,X n − 1 ).

The chain rule provides a way of calculating the joint probability of some random

variables, which is especially useful when there is independence across some of the

variables.

The second rule is the Bayes Rule, which allows us to compute the conditional

probability P(X

X) . Intuitively, it ac-

tually inverses the cause and result. The Bayes Rule takes the following form:

|

Y) from another conditional probability P(Y

|

P(Y

X)P (X)

P(Y)

|

P(Y

|

X)P (X)

a P(Y

P(X

|

Y)

=

a) .

|

X

=

a)P(X

=

21.1.2.2 Continuous Distribution

Now we generalize the above discussions to the continuous case. This time, we need

to introduce the concept of the probability density function (PDF). A probability

density function, p , is a non-negative, integrable function such that

p(x)dx

=

1 .

The probability of a random variable distributed according to a PDF f is com-

puted as follows:

b

P(a

≤

x

≤

b)

=

p(x)dx.

a

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home