STATISTICAL NOISE (Social Science)

Statistical noise refers to variability within a sample, stochastic disturbance in a regression equation, or estimation error. This noise is often represented as a random variable. In the case of a regression equation

tmp137-71_thumb

with E(s) = 0, the random variable s is called a disturbance or error term and reflects statistical noise. Noise in this context is usually viewed as arising from omitted explanatory variables; as such, the error term is a proxy for variables not included in the regression. Variables may be omitted from the regression for several reasons. The theory determining the behavior of the dependent variable Y may be incomplete, or perhaps some variables known to influence Yare unavailable to the researcher. Variables that have only slight influence on Ymight be eliminated from the regression in order to maintain a parsimonious model. If the conditional mean function m(x | ff) is specified para-metrically (e.g., as in ordinary least squares), the error term might reflect error in this specification, which is perhaps only an approximation to the true form of m(x | ff).


Even if the regression equation includes all relevant variables, and if the conditional mean function is correctly specified, the error term may reflect either measurement error in Yor intrinsic randomness in Y. Intrinsic randomness might be the result of nonsystematic variation in human behavior if Y describes the action of individuals. Tastes, preferences, and the like may be explained partly by other variables, but notions of bounded rationality in microeconomic theory suggest that some behavior is inexplicable.

In the typical estimation paradigm, a finite sample of size n is drawn and used to compute an estimate 6 of some quantity ff that is of interest. Even if the estimator is statistically consistent, the estimate that is obtained will typically differ from the true quantity ff because the researcher does not have an infinite amount of data, but only a finite sample. The difference between 6 and ff can be expressed by writing

tmp137-72_thumb

where again s represents statistical noise, which can be positive or negative. In principle, one could draw many samples of size n and compute estimates of ff from each sample; each estimate would differ from the true ff. These random differences constitute a form of statistical noise. In this case, the noise arises from the fact that each sample of size n will not have exactly the same characteristics (e.g., the means, variances, etc. of observations on individual variables will differ across samples, and will also differ from the mean, variance, etc. of the underlying population from which the data are drawn).

Statistical noise plays a large role in determining what can be learned from a sample of data in any estimation setting. The variance of regression residuals determines, in part, the goodness of fit of an estimated regression line as well as the variance of estimators of regression parameters and other quantities. The variance of an estimator determines the precision of estimates that are obtained from data, which in turn affects the width of confidence intervals and the ability to reject null hypotheses of the form H0: ff = 0. Statistical noise is often assumed to be normally distributed, but this assumption is inappropriate in many settings.

Next post:

Previous post: