D'Agostino's test To Digit preference (Statistics)

D’Agostino’s test:

A test based on ordered sample values x^) < x(2) < ••• < x(n) with mean x, used to assess whether the observations arise from a normal distribution. The test statistic is


Appropriate for testing departures from normality due to skewness. Tables of critical values are available.

Daniels, Henry (1912-2000):

Daniels studied at the Universities of Edinburgh and Cambridge, and was first employed at the Wool Industries Research Association in Leeds. This environment allowed Daniels to apply both his mathematical skills to the strength of bundles of threads and his mechanical bent to inventing apparatus for the fibre measurement laboratory. In 1947 he joined the statistical laboratory at the University of Cambridge and in 1957 was appointed as the first Professor of Mathematical Statistics at the University of Birmingham. He remained in Birmingham until his retirement in 1978 and then returned to live in Cambridge. Daniels was a major figure in the development of statistical theory in the 20th Century and was President of the Royal Statistical Society in 1974-1975. He was awarded the Guy medal of the Royal Statistical Society in bronze in 1957 and in gold in 1984. In 1980 Daniels was elected as a Fellow of the Royal Society. Daniels was a expert watch-repairer and in 1984 was created a Liveryman of the Worshipful Company of Clockmakers in recognition of his contribution to watch design. Daniels died on 16 April 2000 whilst attending a statistics conference at Gregynog, Powys, Wales.

Darling test:

A test that a set of random variables arise from an exponential distribution. If X1; x2,…, x„ are the n sample values the test statistic is


where x is the mean of the sample. Asymptotically Km can be shown to have mean and variance (<r2) given by


so that z = (Km — has asymptotically a standard normal distribution under the exponential distribution hypothesis. [Journal of Statistical Planning and Inference, 1994, 39, 399-424.]

Data archives:

Collections of data that are suitably indexed and are accessible to be utilized by researchers aiming to perform secondary data analysis. An example is the Economic and Social Research Council data archive held at the University of Essex in the UK.

Data augmentation:

A scheme for augmenting observed data so as to make it more easy to analyse. A simple example is the estimation of missing values to balance a factorial design with different numbers of observations in each cell. The term is most often used, however, in respect of an iterative procedure, the data augmentation algorithm common in the computation of the posterior distribution in Bayesian inference. The basic idea behind this algorithm is to augment the observed data y by a quantity z which is usually referred to as latent data. It is assumed that given both y and z it is possible to calculate or sample from the augmented data posterior distribution p(0\y, z). To obtain the observed posterior p(0\y), multiple values (imputations) of z from the predictive distribution p(z\y) are generated and the average of p(0\y, z) over the imputations calculated. Because p(z\y) depends on p(0\y) an iterative algorithm for calculating p(0\y) results. Specifically, given the current approximation g(0) to the observed posterior p(0\y) the algorithm specifies:

• generate a sample z(1),…, z(m) from the current approximation to the predictive distribution p(z\y);

• update the approximation to p(0\y) to be the mixture of augmented posteriors of 0 given the augmented data from the step above, i.e.


The two steps are then iterated. See also EM algorithm and Markov chain Monte Carlo methods.


A structured collection of data that is organized in such a way that it may be accessed easily by a wide variety of applications programs. Large clinical databases are becoming increasingly available to clinical and policy researchers; they are generally used for two purposes; to facilitate health care delivery, and for research. An example of such a database is that provided by the US Health Care Financing Administration which contains information about all Medicare patients’ hospitalizations, surgical procedures and office visits.

Database management system:

A computer system organized for the systematic management of a large structured collection of information, that can be used for storage, modification and retrieval of data.

Data dredging:

A term used to describe comparisons made within a data set not specifically prescribed prior to the start of the study. See also data mining and subgroup analysis.

Data editing:

The action of removing format errors and keying errors from data.

Data mining:

The nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It uses expert systems, statistical and graphical techniques to discover and present knowledge in a form which is easily comprehensible to humans.

Significant biological discoveries are now often made by combining data mining methods with traditional laboratory techniques; an example is the discovery of novel regulatory regions for heat shock genes in C. elegans made by mining vast amounts of gene expression and sequence data for significant patterns.

Data reduction:

The process of summarizing large amounts of data by forming frequency distributions, histograms, scatter diagrams, etc., and calculating statistics such as means, variances and correlation coefficients. The term is also used when obtaining a low-dimensional representation of multivariate data by procedures such as principal components analysis and factor analysis.

Data science:

A term intended to unify statistics, data analysis and related methods. Consists of three phases, design for data, collection of data and analysis of data.

Data screening:

The initial assessment of a set of observations to see whether or not they appear to satisfy the assumptions of the methods to be used in their analysis. Techniques which highlight possible outliers, or, for example, departures from normality, such as a normal probability plot, are important in this phase of an investigation. See also initial data analysis.

Data set:

A general term for observations and measurements collected during any type of scientific investigation.

Data smoothing algorithms:

Procedures for extracting a pattern in a sequence of observations when this is obscured by noise. Basically any such technique separates the original series into a smooth sequence and a residual sequence (commonly called the ‘rough’). For example, a smoother can separate seasonal fluctuations from briefer events such as identifiable peaks and random noise. A simple example of such a procedure is the moving average; a more complex one is locally weighted regression. See also Kalman filter and spline function.

David, Florence Nightingale (1909-1993):

Born near Leominster, Florence David obtained a first degree in mathematics from Bedford College for Women in 1931. Originally applied to become an actuary but had the offer of a post withdrawn when it was discovered that the ‘F.N. David’ who had applied was a women. Worked with Karl Pearson at University College, London and was awarded a doctorate in 1938. Also worked closely with Jerzy Neyman both in the United Kingdom and later in Berkeley. During the next 22 years she published eight topics and over 80 papers. In 1962 David became Professor of Statistics at University College, London and in 1967 left England to accept a position at the University of California at Riverside where she established the Department of Statistics. She retired in 1977.

Davies-Quade test:

A distribution free method that tests the hypothesis that the common underlying probability distribution of a sample of observations is symmetric about an unknown median.


The process of locating and correcting errors in a computer routine or of isolating and eliminating malfunctions of a computer itself.


The values of a variable that divide its probability distribution or its frequency distribution into ten equal parts.

Decision theory:

A unified approach to all problems of estimation, prediction and hypothesis testing. It is based on the concept of a decision function, which tells the experimenter how to conduct the statistical aspects of an experiment and what action to take for each possible outcome. Choosing a decision function requires a loss function to be defined which assigns numerical values to making good or bad decisions. Explicitly a general loss function is denoted as L(d, 0) expressing how bad it would be to make decision d if the parameter value were 0. A quadratic loss function, for example, is defined as


and a bilinear loss function as


where a and b are positive constants.

Decision tree:

A graphic representation of the alternatives in a decision making problem that summarizes all the possibilities foreseen by the decision maker. For example, suppose we are given the following problem.

A physician must choose between two treatments. The patient is known to have one of two diseases but the diagnosis is not certain. A thorough examination of the patient was not able to resolve the diagnostic uncertainty. The best that can be said is that the probability that the patient has disease A is p.

A simple decision tree for the problem is shown in Fig. 51.

A simple decision tree.

Fig. 51 A simple decision tree.

Deep models:

A term used for those models applied in screening studies that incorporate hypotheses about the disease process that generates the observed events. The aim of such models is to attempt an understanding of the underlying disease dynamics. See also surface models.

De Finetti, Bruno (1906-1985):

Born in Innsbruck, Austria, De Finetti studied mathematics at the University of Milan, graduating in 1927. He became an actuary and then worked at the National Institute of Statistics in Rome later becoming a professor at the university. DeFinetti is now recognized as a leading probability theorist for whom the sole interpretation of probability was a number describing the belief of a person in the truth of a proposition. He coined the aphorism ‘probability does not exist’, meaning that it has no reality outside an individual’s perception of the world. A major contributor to subjective probability and Bayesian inference, DeFinetti died on 20 July 1985 in Rome.

DeFries-Fulker analysis:

A class of regression models that can be used to provide possible estimates of the fundamental behavioural genetic constructs, heritability and shared or common environment.

Degenerate distributions: Special cases of probability distributions in which a random variable’s distribution is concentrated at only one point. For example, a discrete uniform distribution when k = 1. Such distributions play an important role in queuing theory.

Degrees of freedom:

An elusive concept that occurs throughout statistics. Essentially the term means the number of independent units of information in a sample relevant to the estimation of a parameter or calculation of a statistic. For example, in a two-by-two contingency table with a given set of marginal totals, only one of the four cell frequencies is free and the table has therefore a single degree of freedom. In many cases the term corresponds to the number of parameters in a model. Also used to refer to a parameter of various families of distributions, for example, Student’s t-distribution and the F- distribution.

Delay distribution:

The probability distribution of the delay in reporting an event.

Particularly important in AIDS research, since AIDS surveillance data needs to be appropriately corrected for reporting delay before they can be used to reflect the current AIDS incidence.

Delta(d) technique:

A procedure that uses the Taylor series expansion of a function of one or more random variables to obtain approximations to the expected value of the function and to its variance. For example, writing a variable x as x = f + e where E(x) = f and E(e) = 0, Taylor’s expansion gives


So if f (x) = lnx then var(ln x) = (1/^2)var(x).

Deming, Edwards (1900-1993):

Born in Sioux City, Iowa, Deming graduated from the University of Wyoming in l92l in electrical engineering, received an MS in mathematics and physics from the University of Colorado in 1925 and a Ph.D. in mathematics and physics from Yale University in 1928. He became aware of early work on quality control procedures while working at the Hawthorne plant of the Western Electric Company in Chicago. Deming’s interest in statistics grew in the early 1930s and, in 1939, he joined the US Bureau of the Census. During World War II, Deming was responsible for a vast programme throughout the USA teaching the use of sampling plans and control charts but it was in Japan in the 1950s that Deming’s ideas about industrial production as a single system involving both the suppliers and manufacturers all aimed at satisfying customer need were put into action on a national scale. In 1960 Deming received Japan’s Second Order Medal of the Sacred Treasure and became a national hero. He died on 20 December 1993.


The study of human populations with respect to their size, structure and dynamics, by statistical methods.

De Moivre, Abraham (1667-1754):

Born in Vitry, France, de Moivre came to England in c. 1686 to avoid religious persecution as a Protestant and earned his living at first as a travelling teacher of mathematics, and later in life sitting daily in Slaughter’s Coffee House in Long Acre, at the beck and call of gamblers, who paid him a small sum for calculating odds. A close friend of Isaac Newton, de Moivre reached the normal curve as the limit to the skew binomial and gave the correct measure of dispersion ^Jnp(1 — p). Also considered the concept of independence and arrived at a reasonable definition. His principal work, The Doctrine of Chance, which was on probability theory, was published in 1718. Just before his death in 1754 the French Academy elected him a foreign associate of the Academy of Science.

De Moivre-Laplace theorem:

This theorem states that if X is a random variable having the binomial distribution with parameters n and p, then the asymptotic distribution of X is a normal distribution with mean np and variance np(1 — p). See also normal approximation. [KA1 Chapter 5.]


A term usually encountered in the application of agglomerative hierarchical clustering methods, where it refers to the ‘tree-like’ diagram illustrating the series of steps taken by the method in proceeding from n single member ‘clusters’ to a single group containing all n individuals. The example shown (Fig. 52) arises from applying single linkage clustering to the following matrix of Euclidean distances between five points:


Density estimation:

Procedures for estimating probability distributions without assuming any particular functional form. Constructing a histogram is perhaps the simplest example of such estimation, and kernel density estimators provide a more sophisticated approach. Density estimates can give valuable indication of such features as skewness and multimodality in the data.

A dendrogram for the example matrix.

Fig. 52 A dendrogram for the example matrix.

Density sampling:

A method of sampling controls in a case-control study which can reduce bias from changes in the prevalence of exposure during the course of a study. Controls are samples from the population at risk at the times of incidence of each case.

Denton method:

A widely used method for benchmarking a time series to annual benchmarks while preserving as far as possible the month-to-month movement of the original series.

Descriptive statistics:

A general term for methods of summarizing and tabulating data that make their main features more transparent. For example, calculating means and variances and plotting histograms. See also exploratory data analysis and initial data analysis.

Desig n effect:

The ratio of the variance of an estimator under the particular sampling design used in a study to its variance at equivalent sample size under simple random sampling without replacement.

Design matrix:

Used generally for a matrix that specifies a statistical model for a set of observations. For example, in a one-way design with three observations in one group, two observations in a second group and a single observation in the third group, and where the model is


the design matrix, X is


Using this matrix the model for all the observations can be conveniently expressed in matrix form as


e22,e31J. Also used specifically lor the matrix X in designed industrial experiments which specify the chosen values of the explanatory variables; these are often selected using one or other criteria of optimatily. See also multiple regression.

Design regions:

Regions relevant to an experiment which are defined by specification of intervals of interest on the explanatory variables. For quantitative variables the most common region is that corresponding to lower and upper limits for the explanatory variables, which depend upon the physical limitations of the system and upon the range of values thought by the experimenter to be of interest. [Journal of the Royal Statistical Society, Series B, 1996, 58, 59-76.]

Design rotatability:

A term used in applications of response surface methodology for the requirement that the quality of the derived predictor of future response values is roughly the same throughout the region of interest. More formally a rotatable design is one for which N var(y(x))/ff2 has the same value at any two locations that are the same distance from the design centre. [Journal of the Royal Statistical Society, Series B, 1996, 58, 59-76.]

Design set:

Synonym for training set.


A value associated with a square matrix that represents sums and products of its elements. For example, if the matrix is


then the determinant of A (conventionally written as det (A) or | A|) is given by

ad — bc

Deterministic model:

One that contains no random or probabilistic elements. See also random model.


An algorithm for constructing exact D-optimal designs.


A term used in the analysis of time series data for the process of calculating a trend in some way and then subtracting the trend values from those of the original series. Often needed to achieve stationarity before fitting models to times series. See also differencing. [Journal of Applied Economics, 2003, 18, 271-89.]


A measure of the extent to which a particular model differs from the saturated model for a data set. Defined explicitly in terms of the likelihoods of the two models as


where Lc and Ls are the likelihoods of the current model and the saturated model, respectively. Large values of D are encountered when Lc is small relative to Ls, indicating that the current model is a poor one. Small values of D are obtained in the reverse case. The deviance has asymptotically a chi-squared distribution with degrees of freedom equal to the difference in the number of parameters in the two models. See also G2 and likelihood ratio. [GLM Chapter 2.]

Deviance information criterion (DIC):

A goodness of fit measure similar to Akaike’s information criterion which arises from consideration of the posterior expectation of the deviance as a measure of fit and the effective number of parameters as a measure of complexity. Widely used for comparing models in a Bayesian framework. [.Journal of Business and Economic Statistics, 2004, 22, 107-20.]

Deviance residuals:

The signed square root of an observation’s contribution to total model deviance.


The value of a variable measured from some standard point of location, usually the mean.

DeWitt, Johan (1625-1672):

Born in Dordrecht, Holland, DeWitt entered Leyden University at the age of 16 to study law. Contributed to actuarial science and economic statistics before becoming the most prominent Dutch statesman of the third quarter of the seventeenth century. DeWitt died in The Hague on 20 August 1672.


Abbreviation for degrees of freedom.


An influence statistic which measures the impact of a particular observation, i, on a specific estimated regression coefficient, fy, in a multiple regression. The statistic is the standardized change in fy when the ith observation is deleted from the analysis; it is defined explicitly as


where s(_j) is the residual mean square obtained from the regression analysis with observation i omitted, and Cj is the (j + 1)th diagonal element of (X’X)-1 with X being the matrix appearing in the usual formulation of this type of analysis. See also Cook’s distance, DFFITS and COVRATIO.


An influence statistic that is closely related to Cook’s distance, which measures the impact of an observation on the predicted response value of the observation obtained from a multiple regression. Defined explicitly as;


where j), is the predicted response value for the ith observation obtained in the usual way and )(_,) is the corresponding value obtained when observation i is not used in estimating the regression coefficients; is the residual mean square obtained from the regression analysis performed with the ith observation omitted. The relationship of this statistic to Cook’s distance, D,, is


where H is the hat matrix with diagonal elements h and s2 is the residual sum of squares obtained from the regression analysis including all observations. Absolute values of the statistic larger than 2v/’tr(H/w) indicate those observations that give most cause for concern.

Diagnostic key:

A sequence of binary of polytomous tests applied sequentially in order to indentify the population of origin of a specimen.


Procedures for indentifying departures from assumptions when fitting statistical models. See, for example, DFBETA and DFFITS.

Diagnostic tests:

Procedures used in clinical medicine and also in epidemiology, to screen for the presence or absence of a disease. In the simplest case the test will result in a positive (disease likely) or negative (disease unlikely) finding. Ideally, all those with the disease should be classified by the test as positive and all those without the disease as negative. Two indices of the performance of a test which measure how often such correct classifications occur are its sensitivity and specificity. See also believe the positive rule, positive predictive value and negative predictive value. [SMR Chapter 14.]

Diagonal matrix:

A square matrix whose off-diagonal elements are all zero. For example,


Diary survey:

A form of data collection in which respondents are asked to write information at regular intervals or soon after a particular event has occurred.


Abbreviation for deviance information criterion.

Dichotomous variable:

Synonym for binary variable.

Dieulefait, Carlos Eugenio (1901-1982):

Born in Buenos Aires, Dieulefait graduated from the Universidad del Litoral in 1922. In 1930 he became first director of the Institute of Statistics established by the University of Buenos Aires. For the next 30 years, Dieulefait successfully developed statistics in Argentina while also making his own contributions to areas such as correlation theory and multivariate analysis. He died on 3 November 1982 in Rosario, Argentina.

Differences vs totals plot:

A graphical procedure most often used in the analysis of data from a two-by-two crossover design. For each subject the difference between the response variable values on each treatment are plotted against the total of the two treatment values. The two groups corresponding to the order in which the treatments were given are differentiated on the plot by different plotting symbols. A large shift between the groups in the horizontal direction implies a differential carryovereffect. If this shift is small, then the shift between the groups in a vertical direction is a measure of the treatment effect. An example of this type of plot appears in Fig. 53. [The Statistical Consultant in Action, 1987, edited by D.J. Hand and B.S. , Cambridge University Press, Cambridge.]


A simple approach to removing trends in time series. The first difference of a time series, fyt}, is defined as the transformation

An example of a difference versus total plot.

Fig. 53 An example of a difference versus total plot.


Higher-order differences are defined by repeated application. So, for example, the second difference, D2yt, is given by


Frequently used in applications to achieve a stationarity before fitting models. See also backward shift operator and autoregressive integrated moving average models.

Diggle-Kenward model for dropouts:

A model applicable to longitudinal data in which the dropout process may give rise to informative missing values. Specifically if the study protocol specifies a common set of n measurement times for all subjects, and D is used to represent the subject’s dropout time, with D = d if the values corresponding to times d, d + 1,…, n are missing and D = n + 1 indicating that the subject did not drop out, then a statistical model involves the joint distribution of the observations Y and D. This joint distribution can be written in two equivalent ways,


Models derived from the first factorization are known as selection models and those derived from the second factorisation are called pattern mixture models. The Diggle-Kenward model is an example of the former which specifies a multivariate normal distribution for f (y) and a logistic regression for g(d|y). Explicitly if pt(y) denotes the conditional probability of dropout at time t, given Y = y then,


When the dropout mechanism is informative the probability of dropout at time t can depend on the unobserved yt. See also missing values.

Digit preference:

The personal and often subconscious bias that frequently occurs in the recording of observations. Usually most obvious in the final recorded digit of a measurement. Figure 54 illustrates the phenomenon. [SMR Chapter 7.]

Next post:

Previous post: