NAG To Nyquist frequency (Statistics)

NAG:

Numerical Algorithms Group producing many useful subroutines relevant to statistics.

Naor’s distribution:

A discrete probability distribution that arises from the following model;

Suppose an urn contains n balls of which one is red and the remainder are white. Sampling with replacement of a white ball (if drawn) by a red ball continues until a red ball is drawn. Then the probability distribution of the required number of draws, Y, is

tmp3A8196_thumb

After n — 1 draws the urn contains only red balls and so no more than n draws are required.

National lotteries:

Games of chance held to raise money for particular causes. The first held in the UK took place in 1569 principally to raise money for the repair of the Cinque Ports. There were 400 000 tickets or lots, with prizes in the form of plate, tapestries and money. Nowadays lotteries are held in many countries with proceeds either used to augment the exchequer or to fund good causes. The current UK version began in November 1994 and consists of selecting six numbers from 49 for a one pound stake. The winning numbers are drawn ‘at random’ using one of a number of ‘balls-in-drum’ type of machine. [Chance Rules, 1999, B.S. , Springer-Verlag, New York.]


Nearest-neighbour clustering:

Synonym for single linkage clustering.

Nearest-neighbour methods I:

Methods of discriminant analysis based on studying the training set subjects most similar to the subject to be classified. Classification might then be decided according to a simple majority verdict among those most similar or ‘nearest’ training set subjects, i.e. a subject would be assigned to the group to which the majority of the ‘neighbours’ belonged. Simple nearest-neighbour methods just consider the most similar neighbour. More general methods consider the k nearest neighbours, where k > 1.

Nearest-neighbour methods II:

Methods used in the preliminary investigation of spatial data to assess depatures from complete spatial randomness. For example, histograms of distances between nearest neighbours are often useful.

Necessarily empty cells:

Synonym for structural zeros.

Negative binomial distribution:

The probability distribution of the number of failures, X, before the kth success in a sequence of Bernoulli trials where the probability of success at each trial is p and the probability of failure is q = 1 — p. The distribution is given by

tmp3A8197_thumb

The mean, variance, skewness and kurtosis of the distribution are as follows:

tmp3A8198_thumb

Often used to model overdispersion in count data.

Negative exponential distribution: Synonym for exponential distribution.

Negative hypergeometric distribution:

In sampling without replacement from a population consisting of r elements of one kind and N — r of another, if two elements corresponding to that selected are replaced each time, then the probability of finding x elements of the first kind in a random sample of n elements is given by

tmp3A8199_thumb

The mean of the distribution is Nr/N and the variance is (nr/N) (1 — r/N)(N + n)/(N + 1). Corresponds to a beta binomial distribution with integral parameter values.

Negative multinominal distribution:

A generalization of the negative binomial distribution in which r > 2 outcomes are possible on each trial and sampling is continued until m outcomes of a particular type are obtained.

Negative predictive value:

The probability that a person having a negative result on a diagnostic test does not have the disease. See also positive predictive value.

Negative study:

A study that does not yield a statistically significant result.

Neighbourhood controls:

Synonym for community controls.

Nelder-Mead simplex algorithm:

Type of simplex algorithm.

Nelson-Aalen estimator:

A nonparametric estimator of the cumulative hazard function from censored survival data. Essentially a method of moments estimator.

Nested case-control study:

A commonly used design in epidemiology in which a cohort is followed to identify cases of some disease of interest and the controls are selected for each case from within the cohort for comparison of exposures. The primary advan tage of this design is that the exposure information needs to be gathered for only a small proportion of the cohort members, thereby considerably reducing the data collection costs. [Statistics in Medicine, 1993, 12, 1733-46.]

Nested design:

A design in which levels of one or more factors are subsampled within one or more other factors so that, for example, each level of a factor B occurs at only one level of another factor A. Factor B is said to be nested within factor A. An example might be where interest centres on assessing the effect of hospital and doctor on a response variable, patient satisfaction. The doctors can only practice at one hospital so they are nested within hospitals. See also multilevel model.

Nested model:

Synonym for hierarchical model.

Network:

A linked set of computer systems, capable of sharing computer power and/or storage facilities.

Network sampling:

A sampling design in which a simple random sample or stratified sample of sampling units is made and all observational units which are linked to any of the selected sampling units are included. Different observational units may be linked to different numbers of the sampling units. In a survey to estimate the prevalence of a rare disease, for example, a random sample of medical centres might be selected. From the records of each medical centre in the sample, records of the patients treated for the disease of interest could be extracted. A given patient may have been treated at more than one centre and the more centres at which treatment has been received, the higher the inclusion probability for the patient’s records.

Newman-Keuls test:

A multiple comparison test used to investigate in more detail the differences existing between a set of means as indicated by a significant F-test in an analysis of variance. The test proceeds by arranging the means in increasing order and calculating the test statistic

tmp3A8200_thumb

where xA and xB are the two means being compared, s2 is the within groups mean square from the analysis of variance, and nA and nB are the number of observations in the two groups. Tables of critical values of S are available, these depending on a parameter r that specifies the interval between the ranks of the two means being tested. For example, when comparing the largest and smallest of four means, r = 4, and when comparing the second smallest and smallest means, r = 2. [SMR Chapter 9.]

Newton-Raphson method:

A numerical procedure that can be used to minimize a function f with respect to a set of parameters h’ = [91,..., 0m]. The iterative scheme is

e!+i = e,- – G-1(ei )g(e,-)

where g(h,-) is the vector of derivatives of f with respect to 91,… ,0m evaluated at h,-and G(h,-) is the m x m matrix of second derivatives of f with respect to the parameters again evaluated at h,-. The convergence of the method is very fast when h is close to the minimum but when this is not so G may become negative definite and the method may fail to converge. A further disadvantage of the method is the need to invert G on each iteration. See also Fisher’s scoring method and simplex method.

Neyman, Jerzy (1894-1981):

Born in Bendery, Moldavia, Neyman’s paternal grandfather was a Polish nobleman and a revolutionary who was burned alive in his house during the 1863 Polish uprising against the Russians. His doctoral thesis at the University of Poland was on probabilistic problems in agricultural experiments. Until 1938 when he emigrated to the USA he had worked in Poland though making academic visits to France and England. Between 1928 and 1933 he developed, in collaboration with Egon Pearson, a firm basis for the theory of hypothesis testing, supplying the logical foundation and mathematical rigour that were missing in the early methodology. In 1934 Neyman created the theory of survey sampling and also laid the theoretical foundation of modern quality control procedures. He moved to Berkeley in 1938. Neyman was one of the founders of modern statistics and received the Royal Statistical Society’s Guy medal in gold and in 1968 the US Medal of Science. Neyman died on 5 August 1981 in Berkeley.

Neyman-Pearson lemma:

The central tool of the most commonly used approach to hypothesis testing. Suppose the set of values taken by random variables X’ = [X1, X2,..., Xn] are represented by points in w-dimensional space (the sample space) and associated with each point x is the value assigned to x by two possible probability distributions P0 and P1 of X. It is desired to select a set S0 of sample points x in such a way that if P0(S0) = J]x2S P0(x) = a then for any set S satisfying P(S) = J2xeSP0(x)<a one has P1(S)<P1 (S0). The lemma states that the set S0 = {x : r(x) > C} is a solution of the stated problem and that this is true for every value of C where r(x) is the likelihood ratio, P1(x)/P1(x). [Testing Statistical Hypotheses, 2nd edition, 1986, E.L. Lehmann, Wiley, New York.]

Nightingale, Florence (1820-1910):

Born in Florence, Italy, Florence Nightingale trained as a nurse at Kaisersworth and Paris. In the Crimean War (1854) she led a party of 38 nurses to organize a nursing department as Scutari, where she substantially improved the squalid hospital conditions. She devoted much of her life to campaigns to reform the health and living conditions of the British Army, basing her arguments on massive amounts of data carefully collated and tabulated and often presented in the form of pie charts and bar charts. Ahead of her time as an epidemiologist, Florence Nightingale was acutely aware of the need for suitable comparisons when presenting data and of the possible misleading effects of crude death rates. Well known to the general public as the Lady of the Lamp, but equally deserving of the lesser known alternative accolade, the Passionate Statistician, Florence Nightingale died in London on 13 August 1911.

NLM:

Abbreviation for non-linear mapping.

NOEL:

Abbreviation for no-observed-effect level.

N of 1 clinical trial:

A special case of a crossover design aimed at determining the efficacy of a treatment (or the relative merits of alternative treatments) for a specific patient. The patient is repeatedly given a treatment and placebo, or different treatments, in successive time periods. See also interrupted time series design.

No free lunch theorem:

A theorem concerned with optimization that states (in very general terms) that a general-purpose universal optimization strategy is theoretically impossible, and the only way one strategy can outperform another is if it is specialized to the specific problem under consideration.

Noise:

A stochastic process of irregular fluctuations. See also white noise sequence.

Nominal significance level:

The significance level of a test when its assumptions are valid.

Nominal variable:

Synonym for categorical variable.

Nomograms:

Graphic methods that permit the representation of more than two quantities on a plane surface. The example shown in Fig. 97 is of such a chart for calculating sample size or power.

Noncentral chi-squared distribution:

The probability distribution, f (x), of the sum

tmp3A8201_thumb

where Z1,…, ZV are independent standard normal random variables and S1,… ,8V are constants. The distribution has degrees of freedom v and is given explicitly by

A nomogram for calculating sample size

Fig. 97 A nomogram for calculating sample size.

tmp3A8203_thumb

where X = 2]!=1 is known as the noncentrality parameter. Arises as the distribution of sums of squares in analysis of variance when the hypothesis of the equality of group means does not hold.

Noncentral distributions:

A series of probability distributions each of which is an adapta-ti on of one of the standard sampling distributions such as the chi-squared distribution, the F-distribution or Student’s t-distribution for the distributi on of some test statistic under the alternative hypothesis. Such distributions allow the power of the corresponding hypothesis tests to be calculated. See also noncentral chi-squared distribution, noncentral F-distribution and noncentral t-distribution.

Noncentral F-distribution:

The probability distribution of the ratio of a random variable having a noncentral chi-squared distribution with noncentrality parameter X divided by its degrees of freedom (v1), to a random variable with a chi-squared distribution also divided by its degrees of freedom (v2). Given explicitly by

tmp3A8204_thumb

where B is the beta function. The doubly noncentral F-distribution arises from considering the ratio of two noncentral chi-squared variables each divided by their respective degrees of freedom. [KA2 Chapter 23.]

Noncentral hypergeometric distribution:

A probability distribution constructed by supposing that in sampling without replacement, the probability of drawing say a white ball, given that there are X’ white and N’ — X’ black balls is not N_ but X’[X' + 0(N' — X')]—1 with 0 = 1. [Univariate Discrete Distributions, 2005, N.L. Johnson, A.W. Kemp and S. Kotz, Wiley, New York.]

Noncentral t-distribution:

The probability distribution, f (x), of the ratio

tmp3A8205_thumb

where Z is a random variable having a standard normal distribution and W is independently distributed as y2/v with v degrees of freedom; S is a constant. Given explicitly by

tmp3A8206_thumb

Non-Gaussian time series:

Time series, often not stationary with respect to both mean and period, which exhibit a non-Gaussian random component.

Non-identified response:

A term used to denote censored observations in survival data, that are not independent of the endpoint of interest. Such observations can occur for a variety of reasons:

• Misclassification of the response; e.g. death from cancer, the response of interest, being erroneously misclassified as death from another unrelated cause.

• Response occurrence causing prior censoring; e.g. relapse to heroin use causing a subject to quit a rehabilitation study to avoid chemical detection.

Non-informative censoring:

Censored observations that can be considered to have the same probabilities of failure at later times as those individuals remaining under observation.

Non-informative prior distribution:

A prior distribution which is non-commital about a parameter, for example, a uniform distribution.

Non-linear mapping (NLM):

A method for obtaining a low-dimensional representation of a set of multivariate data, which operates by minimizing a function of the differences between the original inter-individual Euclidean distances and those in the reduced dimensional space. The function minimized is essentially a simple sum-of-squares. See also multidimensional scaling and ordination.

Non-linear model:

A model that is non-linear in the parameters, for example

tmp3A8207_thumb

Some such models can be converted into linear models by linearization (the second equation above, for example, by taking logarithms throughout). Those that cannot are often referred to as intrinsically non-linear, although these can often be approximated by linear equations in some circumstances. Parameters in such models usually have to be estimated using an optimization procedure such as the Newton-Raphson method. In such models linear parameters are those for which the second partial derivative of the model function with respect to the parameter is zero (fi1 and f)3 in the first example above); when this is not the case (fi2 and in the first example above) they are called non-linear parameters.

Non-linear regression:

Synonym for non-linear model.

Nonmasked study:

Synonym for open label study.

Nonmetric scaling:

A form of multidimensional scaling in which only the ranks of the observed dissimilarity coefficients or similarity coefficients are used in producing the required low-dimensional representation of the data. See also monotonic regression.

Nonnegative garrotte:

An approach to choosing subsets of explanatory variables in regression problems that eliminates some variables , ‘shrinks’ the regression coefficients of others (similar to what happens in ridgeregression), and gives relatively stable results unlike many of the usual subset selection procedures. The method operates by finding {ck} to minimize

tmp3A8208_thumb

where {pk} are the results ol least squares estimation, and y and x represent th response and explanatory variables. The {ck} satisfy the constraints;

tmp3A8209_thumb

The new regression coefficients are fik(s) = ckj3k. As the garrotte is drawn tighter by decreasing s, more of the {ck} become zero and the remaining non-zero f~k(s) are shrunken. In general the procedure produces regression equations having more nonzero coefficients than other subset selection methods, but the loss of simplicity is offset by substantial gains in accuracy.

Non-orthogonal designs:

Analysis of variance designs with two or more factors in which the number of observations in each cell are not equal.

Nonparametric analysis of covariance:

An analysis of covariance model in which the covariate effect is assumed only to be ‘smooth’ rather than of some specific linear or perhaps non-linear form. See also kernel regression smoothing.

Non-randomized clinical trial:

A clinical trial in which a series of consecutive patients receive a new treatment and those that respond (according to some pre-defined criterion) continue to receive it. Those patients that fail to respond receive an alternative, usually the conventional, treatment. The two groups are then compared on one or more outcome variables. One of the problems with such a procedure is that patients who respond may be healthier than those who do not respond, possibly resulting in an apparent but not real benefit of the treatment.

Non-response:

A term generally used for failure to provide the relevant information being collected in a survey. Poor response can be due to a variety of causes, for example, if the topic of the survey is of an intimate nature, respondents may not care to answer particular questions. Since it is quite possible that respondents in a survey differ in some of their characteristics from those who do not respond, a large number of non-respondents may introduce bias into the final results. See also item non-response.

No-observed-effect level (NOEL):

The dose level of a compound below which there is no evidence of an effect on the response of interest.

Norm:

Most commonly used to refer to ‘what is usual’, for example, the range into which body temperatures of healthy adults fall, but also occasionally used for ‘what is desirable’, for example, the range of blood pressures regarded as being indicative of good health.

Normal approximation:

A normal distribution with mean np and variance np(1 — p) that acts as an approximation to a binomial distribution as n, the number of trials, increases. The term, p represents the probability of a ‘ success’ on any trial. See also DeMoivre-Laplace theorem.

Normal distribution:

A probability distribution, f(x), of a random variable, X, that is assumed by many statistical methods. Specifically given by

Normal distribution:

A probability distribution, f(x), of a random variable, X, that is assumed by many statistical methods. Specifically given by

tmp3A8210_thumb

where \x and o1 are, respectively, the mean and variance of x. This distribution is bell-shaped as shown in the example given in Fig. 98.

Normal equations:

The linear equations arising in applying least squares estimation to determining the coefficients in a linear model.

Normal equivalent deviate:

A value, xp, corresponding to a proportion, p, that satisfies the following equation

tmp3A8211_thumb

Normality:

A term used to indicate that some variable of interest has a normal distribution.

Normal range:

Synonym for reference interval.

Normal scores:

The expectations of the order statistics of a sample from the standard normal distribution. The basis of probability plots.

Normal scores test:

An alternative to the Mann-Whitney test for comparing populations under shift alternatives.

NORM IX:

A computer program for the maximum likelihood estimation of the parameters in a finite mixture distribution in which the components are multivariate normal distributions with different mean vectors and possibly different variance-covariance matrices.

A normal distribution with mean 10 and variance 9.

Fig. 98 A normal distribution with mean 10 and variance 9.

Nuisance parameter:

A parameter of a model in which there is no scientific interest but whose values are usually needed (but in general are unknown) to make inferences about those parameters which are of such interest. For example, the aim may be to draw an inference about the mean of a normal distribution when nothing certain is known about the variance. The likelihood for the mean, however, involves the variance, different values of which will lead to different likelihood. To overcome the problem, test statistics or estimators for the parameters that are of interest are sought which do not depend on the unwanted parameter(s). See also conditional likelihood.

Null distribution:

The probability distribution of a test statistic when the null hypothesis is true.

Null hypothesis:

The ‘no difference’ or ‘no association’ hypothesis to be tested (usually by means of a significance test) against an alternative hypothesis that postulates nonzero difference or association.

Null matrix:

A matrix in which all elements are zero.

Null vector:

A vector the elements of which are all zero.

Number needed to treat:

The reciprocal of the reduction in absolute risk between treated and control groups in a clinical trial. It is interpreted as the number of patients who need to be treated to prevent one adverse event.

Number of partitions:

A general expression for the number of partitions, N, of n individuals or objects into g groups is given by

tmp3A8213_thumb

For example, when n = 25 and g = 4, then N is 45 232115 901.

Numerical integration:

The study of how the numerical value of an integral can be found.

Also called quadrature which refers to finding a square whose area is the same as the area under a curve. See also Simpson’s rule and trapeziodal rule.

Numerical taxonomy:

In essence a synonym for cluster analysis.

Nuremberg code:

A list of ten standards for carrying out clinical research involving human subjects, drafted after the trials of Nazi war criminals at Nuremberg. See also Helsinki declaration.

Nyquist frequency:

The frequency above which there is no information in a continuous time series which has been digitized by taking values at time intervals St apart. Explicitly the frequency is 1/2St cycles per unit time.

Next post:

Previous post: