Database Reference
In-Depth Information
the probability density function , the derivative of the CDF. (The discrete
version is called the probability mass function .)
The Normal and Chi-Square Distributions
The normal distribution , also called a Gaussian distribution , is probably
the most famous of all the statistical distributions. One reason is that its
functional form leads to nice results for many different procedures. For
example, clustering algorithms often implicitly assume that the underlying
distribution of the cluster is normal.
Theother,moreimportant,reasonitissofamousisbecausethedistribution
ofthemeanofobservationsofarandomvariableconvergestowardanormal
distribution as the number of observations goes to infinity. Amazingly, this
happens regardless of the underlying distribution, assuming that certain
conditions are met (they usually are). In other words, if you have enough
data, then you can approximate nearly anything by this distribution (even
discrete distributions!).
The normal distribution has two parameters: a mean parameter (mu) and a
standard deviation parameter (sigma), and a simple implementation for any
real value of x:
public static double dnorm(double x,double mu,double
sig) {
return Math.exp(
Math.pow(x-mu, 2)
/Math.sqrt(2*sig*sig)
)/Math.sqrt(2*Math.PI*sig*sig);
}
If X 1 ,…,X k are normally distributed, then the sum of their squares take on
what is known as a chi-square distribution with k degrees of freedom. So,
the square of a single, normally distributed random variable will have a
chi-square distribution with 1 degree of freedom. The chi-square is used
to model the variance of a normal distribution as well as for analyzing
“contingency tables,” which are used to determine if the rate of occurrence
of an event is different between two groups. The density function for this
distribution is fairly complicated:
Search WWH ::




Custom Search