Basic Mathematical Reference - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

APPENDIX

Basic Mathematical Reference

This is a “just enough” quick reference. Please consult standard textbooks for details.

PROBABILITY

The probability of a discrete random variable A taking the value a is P(A = a) ∈[ 0 , 1 ]

. This is

sometimes written as P(a) when there is no danger of confusion.

Normalization: a P(A

Joint

probability: P(A

a, B

P(a,b) ,

the

two

events

both

happen

the

same

time.

Marginalization: P(A = a) = b P(A = a, B = b) .

Conditional probability: P(a | b) = P (a, b)/P (b) , the probability of a happening given b happened.

The product rule: P(a,b) = P(a)P(b | a) = P(b)P(a | b) .

P(b | a)P(a)

P(b)

Bayes rule: P(a

. In general, we can condition on one or more random vari-

P(b | a,C)P(a | C)

P(b | C)

ables C : P(a | b, C) =

. In the special case when θ is the model parameter and

is the

p( D | θ)p(θ)

p( D )

observed data, we have p(θ

| D

)

, where p(θ) is called the prior, p(

D |

θ) the likelihood

function of θ (it is not normalized : p(

) = p(

D | θ)dθ = 1), p(

D | θ)p(θ)dθ the evidence, and

p(θ | D

) the posterior.

Independence: The product rule can be simplified as P(a,b) = P(a)P(b) , if and only if A

and B are independent. Equivalently, under this condition P(a | b) = P(a) , P(b | a) = P(b) .

A continuous random variable x has a probability density function (pdf ) p(x) ≥ 0. Unlike

discrete random variables, it is possible for p(x) > 1 because it is a probability density, not a

probability mass. The probability mass in interval

is P(x 1 <X<x 2 ) = x 2

[ x 1 ,x 2 ]

p(x) dx ,

x 1

which is between

[

0 , 1

]

Normalization: −∞

p(x) dx = 1.

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home