CENTRAL LIMIT THEOREM (Social Science)

The central limit theorem (CLT) is a fundamental result from statistics. It states that the sum of a large number of independent identically distributed (iid) random variables will tend to be distributed according to the normal distribution. A first version of the CLT was proved by the English mathematician Abraham de Moivre (1667— 1754). He showed how the normal distribution can be used to approximate the distribution of the number of heads that will result when a coin is tossed a large number of times.

The CLT is the cornerstone of most estimation and inference of statistical models, which in turn are widely used in empirical work in the social sciences. Statistical models involve unknown population parameters that are estimated from a sample. The estimators often take the form of sample averages. According to the CLT, the estimators will therefore be approximately normally distributed for a sufficiently large sample size. This result can be used to draw inference about the population parameters. One example of a statistical model used in social sciences is the linear regression model. Here, the CLT can be used to quantify whether a chosen set of variables explains the variation in a certain response variable.

APPLICATIONS

The CLT has a broad range of applications. Consider, for example, a binomial random variable Sn with parameters (n,p). This variable describes the number of heads in n tosses of a coin with probability 0 < p < 1 of heads. Its distribution is given by


tmp142-10_thumb

For n large, this distribution can be difficult to compute. Another way of representing Sn is as a sum of n iid Bernoulli random variables {x., …, x }. That is, S = x. + x2 + … + xn where the distribution of x. is P(x. = 1) = 1 — P(x. = 0) = p, i = 1, …, n. So we can apply the CLT on Sn, which tells us that Sn ~ N(np, np(1 — p)) for n large enough since a = Ex] = p and a = Var(x) = p(1 -p). This result was first proved by de Moivre in 1733.

The most important use of the CLT is probably in drawing inference about population parameters in statistical models. Most estimators of parameters can be written as sums of the sample, and so the CLT can be used to obtain a measure of the precision of the estimator. In particular, it can be used to test hypotheses regarding the parameters. As a simple example, consider an iid sample {x1, …, xn}with unknown population mean a and variance a . A simple estimator of the parameter a is the sample average,

tmp142-11_thumb

We can now use the CLT to conclude that

tmp142-12_thumb

Since the variance is unknown, it needs to be estimated. This can be done using the sample variance,

tmp142-13_thumb

One can now use the normal approximation for inferential purposes. For example, we can estimate the

tmp142-14_thumb

mately 95 percent probability, where 1.96 is the 97.5th percentile of the normal distribution; one normally refers to this as the confidence interval. The CLT can furthermore be used to test specific hypotheses regarding a.

Next post:

Previous post: