LARGE SAMPLE PROPERTIES (Social Science)

In empirical work, researchers typically use estimators of parameters, test statistics, or predictors to learn about a given feature of an underlying model; these estimators are functions of random variables, and as such are themselves random variables. Data are used to obtain estimates, which are realizations of the corresponding estimators— that is, random variables. Ordinarily, the researcher has available only a single sample of n observations and obtains a single estimate based on this sample; the researcher then wishes to make inferences about the underlying feature of interest. Inference involves the estimation of a confidence interval, a p-value, or a prediction interval, and it requires knowledge about the sampling distribution of the estimator that has been used.

In a small number of cases, exact distributions of estimators can be derived for a given sample size n. For example, in the classical linear regression model, if errors are assumed to be identically, independently, and normally distributed, ordinary least squares estimators of the intercept and slope parameters can be shown to be normally distributed with variance that depends on the variance of the error terms, which can be estimated by the sample variance of the estimated residuals. In most cases, however, exact results for the sampling distributions of estimators with a finite sample are unavailable; examples include maximum likelihood estimators and most nonparametric estimators.


Large sample, or asymptotic, properties of estimators often provide useful approximations of sampling distributions of estimators that can be reliably used for inference-making purposes. Consider an estimator

tmp92-27_thumb

of some quantity 9. The subscript n denotes the fact that a is a function of the n random variables Y, …. Y; this suggests an infinite sequence of estimators for n = 1, 2, …, each based on a different sample size. The large sample properties of an estimator a determine the limiting behavior of the sequence {an I n = 1, 2, …} as n goes to infinity, denoted n — Although the distribution of an may be unknown for finite n, it is often possible to derive the limiting distribution of a as n — The limiting distribution can then be used as an approximation to the distribution of a when n is finite in order to estimate, for example, confidence intervals. The practical usefulness of this approach depends on how closely the limiting, asymptotic distribution of 6n approximates the finite-sample distribution of the estimator for a given, finite sample size n. This depends, in part, on the rate at which the distribution of an converges to the limiting distribution, which is related to the rate at which a converges to 9.

CONSISTENCY

The most fundamental property that an estimator might possess is that of consistency. If an estimator is consistent, then more data will be informative; but if an estimator is inconsistent, then in general even an arbitrarily large amount of data will offer no guarantee of obtaining an estimate "close" to the unknown 9. Lacking consistency, there is little reason to consider what other properties the estimator might have, nor is there typically any reason to use such an estimator.

An estimator an of 9 is said to be weakly consistent if the estimator converges in probability, denoted

tmp92-28_thumb

This occurs whenever

tmp92-29_thumb

for any s > 0. Other, stronger types of consistency have also been defined, as outlined by Robert J. Serfling in Approximation Theorems of Mathematical Statistics (1980). Convergence in probability means that, for any arbitrarily small (but strictly positive) s, the probability of obtaining an estimate different from 9 by more than s in either

tmp92-30_thumb

Note that weak consistency does not mean that it is impossible to obtain an estimate very different from 9 using a consistent estimator with a very large sample size. Rather, consistency is an asymptotic, large sample property; it only describes what happens in the limit. Although consistency is a fundamental property, it is also a minimal property in this sense. Depending on the rate, or speed, with which an converges to 9, a particular sample size may or may not offer much hope of obtaining an accurate, useful estimate.

tmp92-31_thumb

Often, weakly consistent estimators that can be written as scaled sums of random variables have distributions that converge to a normal distribution. The Lindeberg-Levy Central Limit Theorem establishes such a result for the sample mean: If Y1, Y2, …, Yn are independent draws from a population with mean / and finite variance a2, then the sample mean

tmp92-32_thumb

may be used to estimate and

tmp92-33_thumb

The factor n /2 is the rate of convergence of the sample mean, and it serves to scale the left-hand side of the above expression so that its limiting distribution, as n —► is stable—in this instance, a standard normal distribution. This result allows one to make inference about the population mean /—even when the distribution from which the data are drawn is unknown—by taking critical values from the standard normal distribution rather than the often unknown, finite-sample distribution F .

Standard, parametric estimation problems typically yield estimators that converge in probability at the rate n /2. This provides a familiar benchmark for gauging convergence rates of other estimators. The fact that the sample mean converges at rate n /2 means that fewer observations will typically be needed to obtain statistically meaningful results than would be the case if the convergence rate were slower. However, the quality of the approximation of the finite-sample distribution of a sample mean by the standard normal is determined by features such as skewness or kurtosis of the distribution from which the data are drawn. In fact, the finite sample distribution function Fn (or the density or the characteristic functions) of the sample mean can be written as an asymptotic expansion, revealing how features of the data distribution affect the quality of the normal approximation suggested by the central limit theorem. The best-known of these expansions is the Edgeworth expansion, which yields an expansion of Fn in terms of powers of n and higher moments of the distribution of the data. Among those who explain these principles in detail are Harald Cramer in Biometrika (1972), Ole E. Barndorff-Nielsen and David Roxbee Cox in Inference and Asymptotics (1994), and Pranab K. Sen and Julio M. Singer in Large Sample Methods in Statistics: An Introduction with Applications (1993).

Many nonparametric estimators converge at rates slower than n /2. For example, the Nadarya-Watson kernel estimator (Nadarya 1964; Watson 1964) and the local linear estimator (Fan and Gijbels 1996) of the conditional mean function converge at rate n1”4 + d, where d is the number of unique explanatory variables (not including interaction terms); hence, even with only one right-hand side variable, these estimators converge at a much slower rate, n /s, than typical parametric estimators. Moreover, the rate of convergence becomes slower with increasing dimensionality, a phenomenon often called the curse of dimensionality. Another example is provided by data envelopment analysis (DEA) estimators of technical efficiency; under certain assumptions, including variable returns to scale, these estimators converge at rate n2/(1 + d), where d is the number of inputs plus the number of outputs. Leopold Simar and Paul W Wilson discuss this principle in the Journal of Productivity Analysis (2000).

The practical implications of the rate of convergence of an estimator with a convergence rate slower than n /2 can be seen by considering how much data would be needed to achieve the same stochastic order of estimation error that one would achieve with a parametric estimator converging at rate n /2 while using a given amount of data. For example, consider a bivariate regression problem with n = 20 observations. Using a nonparametric kernel estimator or a local linear estimator, one would need m observations to attain the same stochastic order of estimation error that would be achieved with parametric, ordinary least- squares regression; setting m 5 = 20 2 yields m ~ 1,789.

The large sample properties of parametric and non-parametric estimators offer an interesting trade-off. Parametric estimators offer fast convergence, therefore it is possible to obtain meaningful estimates with smaller amounts of data than would be required by nonparametric estimators with slower convergence rates. But this is valid only if the parametric model that is estimated is correctly specified; if not, there is specification error, raising the question of whether the parametric estimator is consistent. On the other hand, nonparametric estimators largely avoid the risk of specification error, but often at the cost of slower convergence rates and hence larger data requirements. The convergence rate achieved by a particular estimator determines what might reasonably be considered a "large sample" and whether meaningful estimates might be obtained from a given amount of data.

CENTRAL LIMIT THEOREM

Aris Spanos, in his book Probability Theory and Statistical Inference: Econometric Modeling with Observational Data (1999, pp. 464-465), lists several popular misconceptions concerning the large sample properties of estimators. It is sometimes claimed that the central limit theorem ensures that various distributions converge to a normal distribution in cases where they do not. The Lindeberg-Levy central limit theorem concerns a particular scaled sum of random variables, but only under certain restrictions (e.g., finite variance). Other scaled summations may have different limiting distributions. Spanos notes that there is a central limit theorem for every member of the Levy-Khintchine family of distributions that includes not only the normal Poisson, and Cauchy distributions, but also a set of infinitely divisible distributions. In addition, continuous functions of scaled summations of random variables converge to several well-known distributions, including the chi-square distribution in the case of quadratic functions.

Next post:

Previous post: