Statistics is a discipline that deals with data: summarizing them, organizing them, finding patterns, and making inferences. Prior to 1850 the word statistics simply referred to sets of facts, usually numerical, that described aspects of the state; that meaning is still seen in the various sets of government statistics, for example the monthly report on the nation’s unemployment rate and the voluminous tables produced in the wake of each decennial census. During the twentieth century, as a result of the work of Karl Pearson, Ronald Fisher, Jerzy Neyman, Egon Pearson, John Tukey, and others, the term came to be used much more broadly to include theories and techniques for the presentation and analyses of such data and for drawing inferences from them. Two works by Stephen Stigler, The History of Statistics: The Measurement of Uncertainty before 1900 (1986) and Statistics on the Table: The History of Statistical Concepts and Methods (1999) offer broad and readable accounts of the history of statistics.
Although often taught in departments of mathematics, statistics is much more than a branch of applied mathematics. It uses the mathematics of probability theory in many of its applications and finite mathematics and the calculus to derive many of its basic theoretical concepts, but it is a separate discipline that requires its practitioners to understand data as well as mathematics.
In a sense, statistics is mainly concerned with variability. If every object of the same class were the same, we would have no need for statistics. If all peas were indeed alike, we could measure just one and know all about peas. If all families reacted similarly to an income supplement, we would have no need to mount a large scale negative income tax experiment. If all individuals held the same opinion on an issue of the day, we would only need to ask one person’s opinion and we would need to take no particular care in how we chose that person. Variability, however, is a fact of life and so statistics is needed to help reveal patterns in the face of variability.
Statistics is used in the collection of data in several ways. If the data are to be collected via an experiment, statistical theory directs how to design that experiment in such a way that it will yield maximum information. The principles of replication (to allow the measurement of variability), control (to eliminate known sources of extraneous variability), and randomization (to "even out" unknown sources of variation) as enunciated by Fisher in his 1935 book The Design of Experiments help ensure that if differences are found between those receiving experimental treatment(s) and those in control group(s), those differences can be attributed to the treatment(s) rather than to preexisting differences between the groups or to experimental error. If the data are to be collected via a sample survey, the principles of probability sampling ensure that the findings can be generalized to the population from which the sample was drawn. Variations on simple random sampling (which is analogous to drawing numbers out of a hat) take advantage of known properties of a population in order to make the sampling more efficient. The technique of stratified sampling is analogous to blocking in experimental design and takes advantage of similarities in units of the population to control variability.
Once data are collected, via experiments, sample surveys, censuses, or other means, they rarely speak for themselves. There is variability, owing to the intrinsic variability of the units themselves or to their reactions to the experimental treatments, or to errors made in the measuring process itself. Statistical techniques for measuring the central tendency of a variable (e.g., means, medians) clear away variability and make it possible to view patterns and make comparisons across groups. Measures of the variability of a variable (e.g., ranges and standard deviations) give information on the spread of the data— within a group and in comparisons between groups. There are also summarization techniques of correlation and regression to display the patterns of relations between variables—for example, how does a nation’s GDP per capita relate to its literacy rate? These numerical techniques work hand in hand with graphical techniques (e.g., histograms, scattergrams) to reveal patterns in the data. Indeed, using numerical summaries without examining graphical representations of the data can often be misleading. Of course, there are many more complicated and sophisticated summary measures (e.g., multiple regression) and graphical techniques (e.g., residual plots) that aid in the summarization of data. Much of modern data analysis, especially as developed by John Tukey, relies on less conventional measures, on transformations of data, and on novel graphical techniques. Such procedures as correspondence analysis and data mining harness the power of modern computing to search for patterns in very large datasets.
FREQUENTIST AND BAYESIAN INFERENCE
Perhaps the most important use of statistics, however, is in making inferences. One is rarely interested merely in reactions of subjects in an experiment or the answers from members of a sample; instead one wishes to make generalizations to people who are like the experimental subjects or inferences about the population from which the sample was drawn. There are two major modes of making such inference.
Classical or frequentist inference (the mode that has been most often taught and used in the social sciences) conceptualizes the current experiment or sample as one from an infinite number of such procedures carried out in the same way. It then uses the principles codified by Fisher and refined by Neyman and Pearson to ask whether the differences found in an experiment or from a sample survey are sufficiently large to be unlikely to have happened by mere chance. Specifically it takes the stance of positing a null hypothesis that is the opposite of what the investigator believes to be true and has set out to prove. If the outcome of the experiment (or the sample quantity) or one more extreme is unlikely to have occurred if the null hypothesis is true, then the null hypothesis is rejected. Conventionally if the probability of the outcome (or one more extreme) occurring when the null hypothesis is true is less than .05 (or sometimes .01), then the result is declared "statistically significant."
Frequentists also carry out estimation by putting a confidence interval around a quantity measured from the sample to infer what the corresponding quantity in the population is. For example, if a sample survey reports the percentage in the sample who favor a particular candidate to be 55 percent and gives a 95 percent confidence interval as 52 to 58 percent, the meaning is that a procedure has been followed that gives an interval that covers the true population percent 95 percent of the time. The frequentist does not know (and is not able to put a probability on) whether in any particular case the interval covers the true population percent—the confidence is in the procedure, not in the interval itself. Further, the interval takes into account only what is known as sampling error, the variation among the conceptually infinite number of replications of the current procedure. It does not take into account non-sampling error arising from such problems in data collection as poorly worded questions, nonre-sponse, and attrition from a sample.
In order for these mechanisms of classical statistics to be used appropriately, a probability mechanism (probability sampling or randomization) must have been used to collect the data. In the social sciences this caution is often ignored; statistical inference is performed on data collected via non-probabilistic means and even on complete enumerations. There is little statistical theory to justify such applications, although superpopulation models are sometimes invoked to justify them and social scientists often argue that the means by which the data were accumulated resemble a random process.
Since the 1970s there has been a major renewal of interest in what was historically called inverse probability and is currently called Bayesian inference (after the English nonconformist minister and—during his lifetime— unpublished mathematician Thomas Bayes [1701?— 1761]). Admitting the experimenter’s or analyst’s subjective prior distribution formally into the analysis, Bayesian inference uses Bayes’ theorem (which is an accepted theorem of probability for both frequentists and Bayesians) to combine the prior distribution with the data from the current investigation to update the probability that the hypotheses being investigated is true. Note that Bayesians do speak of the probability of a hypothesis being true while frequentists must phrase their conclusions in terms of the probability of outcomes when the null hypothesis is true. Further, Bayesians construct credibility intervals, for which, unlike the frequentists’ confidence intervals, it is proper to speak of the probability that the population quantity falls in the interval, because in the Bayesian stance population parameters are viewed as having probability distributions. For a frequentist, a population parameter is a fixed, albeit usually unknown, constant. Much of the revival of interest in Bayesian analysis has happened in the wake of advances in computing that make it possible to use approximations of previously intractable models.
While the distinction between Bayesians and frequentists has been fairly sharp, as Stephen E. Fienberg and Joseph B. Kadane (2001) note the two schools are coming together, with Bayesians paying increasing attention to frequentist properties of Bayesian procedures and frequen-tists increasingly using hierarchical models.
Two much more detailed descriptions of the field of statistics and its ramifications than is possible here are given by William H. Kruskal (1968) and Fienberg and Kadane (2001).