H0: To Hypothesis testing (Statistics)

H0

Symbol for null hypothesis.

Hi:

Symbol for alternative hypothesis.

Hadamard matrix:

An n x n matrix Hn consisting entirely of ±1s and such that HnHn = diag(n, n,…, n). Important in response surface methodology.

Haenszel, William (1910-1998):

Born in Rochester, New York, Haenszel received a B.A.degree in sociology and mathematics from the University of Buffalo in 1931, and an M.A. degree in statistics in 1932. For the next 20 years he worked first as a statistician at the New York State Department of Health and then as Director of the Bureau of Vital Statistics at the Connecticut State Department of Health. In 1952 Haenszel became Head of the Biometry Section of the National Cancer Institute where he stayed until his retirement in 1976. Haenszel made important contributions to epidemiology including the Mantel-Haenszel test. He died on 13 March 1998, in Wheaton, Illinois.

Haldane’s estimator:

An estimator of the odds ratio given by

where a, b, c and d are the cell frequencies in the two-by-two contingency table of interest. See also Jewell’s estimator.

Half-mode:

A term sometimes used for the mode of a probability distribution or a frequency distribution if this occurs at an extreme value of the variable, for example, in the case of a J-shaped distribution.

Half-normal distribution:

Synonym for folded normal distribution.

Half-normal plot:

A plot for diagnosing model inadequacy or revealing the presence of outliers, in which the absolute values of, for example, the residuals from a multiple regression are plotted against the quantiles of the standard normal distribution. Outliers will appear at the top right of the plot as points that are separated from the others, while systematic departures from a straight line could indicate that the model is unsatisfactory.

Halo effect:

The tendency of a subject’s performance on some task to be overrated because of the observer’s perception of the subject ‘doing well’ gained in an earlier exercise or when assessed in a different area.

Halperin, Max (1917-1988):

Born in Omaha, Nebraska, Halperin graduated from the University of Omaha in 1940 with a degree in mathematics. In 1950 he obtained a Ph.D. degree from the University of North Carolina. His career began as a research mathematician at the RAND corporation and posts at the National Institutes of Health and the Division of Biologic Standards followed. Later he became Research Professor of Statistics and Director of the Biostatistics Center of the Department of Statistics at the George Washington University. Halperin made important contributions in many areas of statistics including multivariate analysis, regression analysis, multiple comparisons and the detection of outliers. He died on 1 February 1988 in Fairfax, Virginia.

Hanging rootogram:

A diagram comparing an observed rootogram with a fitted curve, in which differences between the two are displayed in relation to the horizontal axis, rather than to the curve itself. This makes it easier to spot large differences and to look for patterns. An example is given in Fig. 71.

Hankel matrix:

A variance-covariance matrix between past and present values of a time series with some future values of the series at some time t.

Hannan, E.J. (1921-1994):

Having obtained a commerce degree at the University of Melbourne, Hannan began his career as an economist in the Reserve Bank of Australia. When spending a year at the Australian National University in 1953 he was ‘spotted’ by P.A.P. Moran and began his research on various aspects of the analysis of time series which eventually brought him international recognition. Hannan received many honours including the Lyle medal for research in mathematics and physics from the Australian Academy of Science and the Pitman medal from the Statistical Society of Australia. He died on 7 January 1994.

Fig. 71 An example of a hanging rootogram.

Hansen-Hurwitz estimator:

An unbiased estimator of the total size of a population given by

where n is the number of sampling units (often regions or areas in this context), y is the number of individuals, animals or species, observed in the ith sampling unit, and pt is the probability of selecting the ith unit for i = 1, 2,…, N where N is the number of population units. (When sampling is with replacement n = N.) An unbiased estimator of the variance of i is

Hansen, Morris Howard (1910-1990):

Born in Thermopolis, Wyoming, Hansen studied accounting at the University of Wyoming obtaining a B.S. degree in 1934. His formal training in statistics consisted of after-hours classes taken at the Graduate School of the U.S. Department of Agriculture. Hansen received a master’s degree in statistics in 1940. In 1941 he joined the Census Bureau where he developed the mathematical theory underlying sampling methods. This resulted in the publication in 1953 of the standard reference work, Sample Survey Methods and Theory. Elected a member of the National Academy of Sciences in 1976, Hansen died on 9 October 1990.

Haplotype:

A combination of two or more alleles that are present in the same gamete.

Haplotype analysis:

The analysis of haplotype frequencies in one or more populations, with the aim of establishing associations between two or more alleles, between a haplo-type and a phenotypic trait, or the genetic relationship between populations.

Hardy-Weinberg law:

The law stating that both gene frequencies and genotype frequencies will remain constant from generation to generation in an infinitely large interbreeding population in which mating is at random and there is no selection, migration or mutation. In a situation where a single pair of alleles (A and a) is considered, the frequencies of germ cells carrying A and a are defined as p and q, respectively. At equilibrium the frequencies of the genotype classes are p2(AA), 2pq(Aa) and q2(aa).

Harmonic analysis:

A method of determining the period of the cyclical term St in a time series of the form

where et represents the random fluctuations of the series about St. The cyclical term is represented as a sum of sine and cosine terms so that Xt becomes

For certain series the periodicity of the cyclical component can be specified very accurately, as, for example, in the case of economic or geophysical series whichcontain a strict 12-month cycle. In such cases, the coefficients {Af}, {Bf} can be estimated by regression techniques. For many series, however, there may be several periodic terms present with unknown periods and so not only the coefficients {Aj}, {Bj} have to be estimated but also the unknown frequencies {!}. The so-called hidden periodicities can often be determined by examination of the periodogram which is a plot of I{!) against ! where

and ! = 2np/N, p = 1, 2, …, [N/2]; N is the length of the series. Large ordinates on this plot indicate the presence of a cyclical component at a particular frequency. As an example of the application of this procedure Fig. 72 shows the sunspot series and the periodogram of this series based on 281 observations. It is clear that the ordinate at ! = 2n x 28/281, corresponding to a period of approximately 10 years, is appreciably larger than the other ordinates. If several peaks are observed in the periodogram it cannot be concluded that each of these corresponds to a genuine periodic component in Xt since it is possible that peaks may occur due simply to random fluctuations in the noise term et. Various procedures for formally assessing periodic ordinates are available of which the most commonly used are Schuster’s test, Walker’s test and Fisher’s g statistic. See also spectral analysis, fast Fourier transform and window estimates.

Harmonic mean:

The reciprocal of the arithmetic mean of the reciprocals of a set of observations x1, x2,…, xn. Specifically obtained from the equation

Used in some methods of analysing non-orthogonal designs. The harmonic mean is either smaller than or equal to the arithmetic mean and the geometric mean.

Harrington and Fleming Gp tests:

A class of linear rank test for comparing two interval censored samples. Useful in testing for a difference between two or more survival

Fig. 72 Sunspot activity and its periodogram

Harris and Stevens forecasting:

A method of making short term forecasts in a time series that is subject to abrupt changes in pattern and transient effects. Examples of such series are those arising from measuring the concentration of certain biochemicals in biological organisms, or the concentration of plasma growth hormone. The changes are modelled by adding a random perturbation vector having zero mean to a linearly updated parameter vector.

Harris walk:

A random walk on the set of nonnegative integers, for which the matrix of transition probabilities consists of zeros except for the elements;

Harris walk:

A random walk on the set of nonnegative integers, for which the matrix of transition probabilities consists of zeros except for the elements;

Hartley, Herman Otto (1912-1980):

Born in Berlin, Hartley obtained a Ph.D. degree in mathematics from the University of Berlin in 1933. In 1934 he emigrated to England where he worked with Egon Pearson in producing the Biometrika Tables for Statisticians. In 1953 Hartley moved to Iowa State College in the USA and from 1963 to 1977 established and ran the Institute of Statistics at Texas A&M University. Contributed to data processing, analysis of variance, sampling theory and sample surveys. Hartley was awarded the S.S. Wilks medal in 1973. He died on 30 December 1980 in Durham, USA.

Hartley’s test:

A simple test of the equality of variances of the populations corresponding to the groups in a one way design. The test statistic (if each group has the same number of observations) is the ratio of the largest (s2 largest) to the smallest (s2 smallest) within group variance, i.e.

Critical values are available in many statistical tables. The test is sensitive to departures from normality. See also Bartlett’s test and Box’s test.

Hat matrix:

A matrix, H, arising in multiple regression, which is used to obtain the predicted values of the response variable corresponding to each observed value via the equation

The matrix H puts the ‘hats’ on the elements of y; it is a symmetric matrix and is also idempotent. Given explicitly in terms of design matrix, X as

The diagonal elements of H are often useful diagnostically in assessing the results from the analysis. See also Cook’s distance.

Haugh’s test:

A test for the independence of two time series which is based on the sum of finitely many squares of residual cross-correlations.

Hausdorff dimension:

Synonym for fractal dimension.

Hausman misspecification test:

A test that considers two estimators 0 and 0 which are both consistent if the model is correctly specified, but converge to different limits when the model is misspecified. The test statistic used is

where Cov(0 — 0)’ is the variance-covariance matrix of the difference if the model is correctly specified. The test statistic is asymptotically chi-squared distributed with degrees of freedom equal to the rank of Cov(0 — 0)’. The test suffers from being sensitive to many types of misspecification and being difficult to implement since it requires an estimator of the relevant covariance matrix.

Hawthorne effect:

A term used for the effect that might be produced in an experiment simply from the awareness by the subjects that they are participating in some form of scientific investigation. The name comes from a study of industrial efficiency at the Hawthorne Plant in Chicago in the 1920s.

Hazard function:

The risk that an individual experiences an event (death, improvement etc.) in a small time interval, given that the individual has survived up to the beginning of the interval. It is a measure of how likely an individual is to experience an event as a function of the age of the individual. Usually denoted h(t), the function can be expressed in terms of the probability distribution of the survival times f (t) and the survival function S(t), as h(t) = f (t)/S(t). The hazard function may remain constant, increase, decrease or take on some more complex shape. The function can be estimated as the proportion of individuals experiencing an event in an interval per unit time, given that they have survived to the beginning of the interval, that is

Care is needed in the interpretation of the hazard function because of both selection effects due to variation between individuals and variation within each individual over time. For example, individuals with a high risk are more prone to experience an event early, and those remaining at risk will tend to be a selected group with a lower risk. This will result in the hazard rate being ‘pulled down’ to an increasing extent as time passes. See also survival function, bathtub curve and frailty models.

Hazard plotting:

Based on the hazard function of a distribution, this procedure provides estimates of distribution parameters, the proportion of units failing by a given age, percentiles of the distribution, the behaviour of the failure rate of the units as a function of their age, and conditional failure probabilities for units of any age.

Hazard regression:

A procedure for modelling the hazard function that does not depend on the assumptions made in Cox’s proportional hazards model, namely that the log-hazard function is an additive function of both time and the vector of covariates. In this approach, spline functions are used to model the log-hazard function.

Head-banging smoother:

A procedure for smoothing spatial data. The basic algorithm proceeds as follows:

• for each point or area whose value y is to be smoothed, determine the N nearest neighbours to location x,-

• from among these N nearest neighbours, define a set of points around the point area, such that the ‘triple’ (pair plus target point at x) are roughly collinear. Let NTRIP be the maximum number of such triplets

• let (ak, bk) denote the (higher, lower) of the two values in the kth pair and let A = median{ak}, B = median{bk}

• the smoothed value corresponding to y is e = median{A, y, B}.

Healthy worker effect:

The phenomenon whereby employed individuals tend to have lower mortality rates than those unemployed. The effect, which can pose a serious problem in the interpretation of industrial cohort studies, has two main components:

• selection at recruitment to exclude the chronically sick resulting in low standardized mortality rates among recent recruits to an industry,

• a secondary selection process by which workers who become unfit during employment tend to leave, again leading to lower standardized mortality ratios among long-serving employees.

Heckman selection model:

A regression model for situations in which values of the response variable may not be sampled at random from the population of interest. For example, if the response variable is salary and the population women, it may be that women who would have low wages choose not to work and so the sample of observed salaries is biased upwards.

Hellinger distance:

A measure of distance between two probability distributions, f (x) and g(x) given by ^2(1 — p) where

Hello-goodbye effect:

A phenomenon originally described in psychotherapy research, but one which may arise whenever a subject is assessed on two occasions, with some intervention between the visits. Before an intervention a person may present himself/ herself in as bad a light as possible, thereby hoping to qualify for treatment, and impressing staff with the seriousness of his/her problems. At the end of the study the person may want to ‘please’ the staff with his/her improvement, and so may minimize any problems. The result is to make it appear that there has been some improvement when none has occurred, or to magnify the effects that did occur. [Journal of Clinical Psychology, 2000, 56, 853-9.]

Helmert contrast:

A contrast often used in the analysis of variance, in which each level of a factor is tested against the average of the remaining levels. So, for example, if three groups are involved of which the first is a control, and the other two treatment groups, the first contrast tests the control group against the average of the two treatments and the second tests whether the two treatments differ.

Helsinki declaration:

A set of principles to guide clinicians on the ethics of clinical trials and other clinical research. See also Nuremberg code.

Herbicide bioassay:

A procedure for establishing a dose-response curve in the development of new herbicides. As no objective death criteria can be given for plants, a gradedresponse such as biomass or height reduction must be considered.

Herfindahl index:

An index of industry concentration given by

where S is the combined size of all firms in an industry (scaled in terms of employees, sales, etc.) and s, is the size of the ith firm and there are n firms ranked from largest to smallest. Concentration increases with the value of the index.

Hermite functions: Functions fy(x) given by

where H(x) is the Hermite polynomial defined by

Hessian matrix:

The matrix of second-order partial derivatives of a function f of n independent variables x1, x2,…, xn with respect to those variables. The element in the ith row and the jth column of the matrix is therefore 32y/3xi3xj-. In statistics the matrix most often occurs for the log-likelihood as a function of a set of parameters. The inverse of the matrix then gives standard errors and covariances of the maximum likelihood estimates of the parameters.

Heterogeneous:

A term used in statistics to indicate the inequality of some quantity of interest (usually a variance) in a number of different groups, populations, etc. See also homogeneous.

Hettmansperger-McKean test:

A distribution free method for assessing whether specific subsets of the regression parameters in a multiple regression are zero.

Heuristic computer program:

A computer program which attempts to use the same sort of selectivity in searching for solutions that human beings use.

Heywood cases:

Solutions obtained when using factor analysis in which one or more of the variances of the specific variates become negative.

Hidden Markov models:

An extension of finite mixture models which provides a flexible class of models exhibiting dependence and a possibly large degree of variability.

Hierarchical design:

Synonym for nested design.

Hierarchical models:

A series of models for a set of observations where each model results from adding, deleting or constraining parameters from other models in the series. See also multilevel models.

Hidden time effects:

Effects that arise in data sets that may simply be a result of collecting the observations over a period of time. See also cusum.

Higgins’s law:

A ‘law’ that states that the prevalance of any condition is inversely proportional to the number of experts whose agreement is required to establish its presence.

High breakdown methods:

Methods that are designed to be resistant to even multiple severe outliers. Such methods are an extreme example of robust statistics.

Higher criticism statistic:

A statistic for testing whether m normal means are all zero against the alternative that a small fraction are non-zero. Can also be used for detecting non-normality in a data set.

Hill, Austin Bradford (1897-1991):

Born in Hampstead, London, Hill served as a pilot in World War I. Contracting tuberculosis prevented him taking a medical qualification, as he would have liked, so instead he studied for a London University degree in economics by correspondence. His interest in medicine drew him to work with the Industrial Fatigue Research Board, a body associated with the Medical Research Council. He improved his knowledge of statistics at the same time by attending Karl Pearson’s lectures at University College. In 1933 Hill became Reader in Epidemiology and Vital Statistics at the recently formed London School of Hygiene and Tropical Medicine (LSHTM). He was an extremely successful lecturer and a series of papers on Principles of Medical Statistics published in the Lancet in 1937 were almost immediately reprinted in topic form. The resulting text remained in print until its ninth edition in 1971. In 1947 Hill became Professor of Medical Statistics at the LSHTM and Honorary Director of the MRC Statistical Research Unit. He had strong influence in the MRC in particular in their setting up of a series of randomized controlled clinical trials, the first involving the use of streptomycin in pulmonory tuberculosis. Hill’s other main achievment was his work with Sir Richard Doll on a case-control study of smoking and lung cancer. Hill received the CBE in 1951, was elected Fellow of the Royal Society in 1954 and was knighted in 1961. He received the Royal Statistical Society’s Guy medal in gold in 1953. Hill died on 18 April, 1991, in Cumbria, UK.

Hill-climbing algorithm:

An algorithm used in those techniques of cluster analysis which seek to find a partition of n individuals into g clusters by optimizing some numerical index of clustering. Since it is impossible to consider every partition of the n individuals into g groups (because of the enormous number of partitions), the algorithm begins with some given initial partition and considers individuals in turn for moving into other clusters, making the move if it causes an improvement in the value of the clustering index. The process is continued until no move of a single individual causes an improvement. See also K-means cluster analysis.

Hinge:

A more exotic (but less desirable) term for quartile.

Histogram:

A graphical representation of a set of observations in which class frequencies are represented by the areas of rectangles centred on the class interval. If the latter are all equal, the heights of the rectangles are also proportional to the observed frequencies. A histogram of heights of elderly women is shown in Fig. 73.

Historical controls:

A group of patients treated in the past with a standard therapy, used as the control group for evaluating a new treatment on current patients. Although used fairly frequently in medical investigations, the approach is not to be recommended since possible biases, due to other factors that may have changed over time, can never be satisfactorily eliminated. See also literature controls.

Historical prospective studies: A ‘prospective study’ in which the cohort to be investigated and its subsequent disease history are identified from past records, for example, from information of an individual’s work history.

Hit rate:

A term occasionally used for the number of correct classifications in a discriminant analysis.

HLM:

Software for the analysis of multilevel models. See also MlwiN and VARCL.

Hodges-Lehmann estimator:

An estimator for the location difference of two uncensored data samples, y, y2, •••, y^ and yn +1,…, yn, n = nx + given by

Hoeffding test:

A distribution free method for testing for the independence of two random variables X and Y, that is able to detect a broader class of alternatives to independence than is possibly by using sample correlation coefficients. [NSM Chapter 8.]

Fig. 73 A histogram of heights of elderly women.

Hoeffding, Wassily (1914-1991):

Born in Mustamaki, Finland, Hoeffding began his university education studying economics but quickly switched to mathematics eventually earning a Ph.D. degree from Berlin University in 1940 with a dissertation on nonparametric measures of association and correlation. He emigrated to the USA in 1946 settling in Chapel Hill, North Carolina. Hoeffding made significant contributions to sequential analysis, statistical decision theory and central limit theorems. He died on 28 February 1991 in Chapel Hill.

Hogben, Lancelot (1895-1975):

Born in Southsea, Hampshire, Hogben studied at Cambridge. He was a man of remarkable intellect and a great communicator who made important and original contributions in both theoretical and applied science. Hogben held academic appointments in zoology in England, Scotland, Canada and South Africa before becoming Professor of Zoology at Birmingham from 1941 to 1947, and then Professor of Medical Statistics at the same university from 1947 to 1961. During his career he held five university chairs. Best remembered for his popular topic, Mathematics for the Millions. Hogben died on 22 August 1975.

Holdout estimate:

A method of estimating the misclassification rate in a discriminant analysis. The data is split into two mutually exclusive sets and the classification rule derived from one and its performance evaluated on the other. The method makes inefficient use of the data (using only part of them to construct the classification rule) and gives a pessimistically biased estimate of the derived rule’s misclassification rate.

Holdover effect:

Synonym for carryover effect.

Holgate, Philip (1934-1993):

Born in Chesterfield, UK, Holgate attended Newton Abbot Grammar School and then entered Exeter University in 1952 to read mathematics. He began his career as a school teacher, teaching mathematics and physics at Burgess Hill School, Hampstead. Holgate’s career as a statistician began in 1961 when he joined the Scientific Civil Service at Rothamsted. In 1967 he took up a lecturer post at Birkbeck College, London, progressing to professor in 1970. Holgate’s statistical work was primarily in the area of stochastic processes in biology, and he also made seminal contributions to non-associative algebras.

Hollander test:

A distribution free method for testing for bivariate symmetry. The null hypothesis tested is the two random variables X and Y are exchangeable.

Homogeneous:

A term that is used in statistics to indicate the equality of some quantity of interest (most often a variance), in a number of different groups, populations, etc. See also heterogeneous.

Horvitz-Thompson estimator:

An unbiased estimator of the total size of a population when sampling is with or without replacement from a finite population and sampling unit i has probability p, of being included in the sample. The estimator does not depend on the number of times a unit may be selected, since each unit is utilized only once in the formula

where v is the effective sample size (the number of distinct units in the sample) and y-is the number of individuals, animals or species observed in the ith sampling unit. If the probability that both unit i and unit j are included in the sample is pij and all these joint inclusion probabilities are greater than zero then an unbiased estimator of the variance of i is

Hot deck:

A method widely used in surveys for imputing missing values. In its simplest form the method involves sampling with replacement m values from the sample respondents Ar to an item y, where m is the number of non-respondents to the item and r is the number of respondents. The sampled values are used in place of the missing values. In practice, the accuracy of imputation is improved by first forming two or more imputation classes using control variables observed in all sample units, and then applying the procedure separately within each imputation class for each item with missing values. [Statistics in Medicine, 1997, 16, 5-19.]

Hotelling, Harold (1895-1973):

Born in Fulda, Minnesota, Hotelling first studied journalism at the University of Washington but eventually turned to mathematics gaining a Ph.D. in 1924 for his dissertation in the field of topology. Hotelling worked first at Stanford University before, in 1931, being appointed Professor of Economics at Columbia University. His major contributions to statistical theory were in multivariate analysis and probably his most important paper was ‘The generalization of Student’s ratio’ now known as Hotelling’s T2. He also played a major role in the development of principal components analysis and of canonical correlations. Elected to the National Academy of Sciences in 1972 and in 1973 to a membership of The Academia Nazionale dei Lincei in Rome. Hotelling died on 26 December 1973 in Chapel Hill, North Carolina.

Hotelling’s T2 test:

A generalization of Student’s t-test for multivariate data. Can be used to test either whether the population mean vector of a set of q variables is the null vector or whether the mean vectors of two populations are equal. In the latter case the relevant test statistic is calculated as

where w1and n2 are sample sizes, x 1 and x2 are sample mean vectors, and S is a weighted average of the separate sample variance-covariance matrices. Under the hypothesis that the population mean vectors are the same,

has an F-distribution with q and (n1 + n2 — q — 1) degrees of freedom. See also Mahalanobis D2.

Hot hand hypothesis:

Synonymous with streaky hypothesis.

Household interview surveys:

Surveys in which the primary sampling units are typically geographic areas such as counties or cities. For each such unit sampled, there are additional levels of subsampling involving successively smaller geographic areas, for example, census districts, neighbourhoods within census districts and households within neighbourhoods. Individuals within sampled households may also be sub-sampled. The main purpose of the multistage cluster sampling is to lessen the number of areas to which interviewers must travel.

Hsu, Pao-Lu (1910-1970):

Born in Beijing, China, Hsu first studied chemistry at what was later to become Beijing University, but transferred to the Department of Mathematics in Qin Huo University in 1930. In 1938 he received a Ph.D. degree from University College, London. Hsu worked in a number of areas of probability theory and mathematical statistics particularly on the distribution of sample variances from nonnormal populations. In 1956 he was made Director of the first research institute for probability and statistics to be established in China. Hsu died on 18 December 1970 in Beijing.

Huberized estimator:

Synonym for sandwich estimator.

Huber’s condition:

A necessary and sufficient design condition for the estimates from using least squares estimation in linear models to have an asymptotic normal distribution provided the error terms are independently and identically distributed with finite variance. Given explicitly by

where hiin are the diagonal elements of the hat matrix.

Human capital model:

A model for evaluating the economic implication of disease in terms of the economic loss of a person succumbing to morbidity or mortality at some specified age. Often such a model has two components, the direct cost of disease, for example, medical management and treatment, and the indirect cost of disease, namely the loss of economic productivity due to a person being removed from the labour force. [Berichte uber Landwirtschaft, 1996, 74, 165-85.]

Human height growth curves:

The growth of human height is, in general, remarkably regular, apart from the pubertal growth spurt. A satisfactory longitudinal growth curve is extremely useful as it enables long series of measurements to be replaced by a few parameters, and might permit early detection and treatment of growth abnormalities. Several such curves have been proposed, of which perhaps the most successful is the following five-parameter curve

where t = time (prenatal age measured from the day of birth), X = height reached at age t, A = adult height, B = height reached by child at age E, C = a first time-scale factor in units of inverse time, D = a second time-scale factor in units of inverse time, E = approximate time at which the pubertal growth spurt occurs. [Biometrics, 1988, 44, 995-1003.]

Huynh-Feldt correction:

A correction term applied in the analysis of data from longitudinal studies by simple analysis ofvariance procedures, to ensure that the within subject F-tests are approximately valid even if the assumption of sphericity is invalid. See also Greenhouse-Geisser correction and Mauchly test.

Hyperbolic distributions:

Probability distributions, f (x), for which the graph of logf (x) is a hyperbola.

Hyperexponential distribution:

A term sometimes used for a mixture of two exponential distributions with different means, X1 and X2, and mixing proportion p, i.e. the probability distribution given by

Hypergeometric distribution: A probability distribution associated with sampling without replacement from a population of finite size. If the population consists of r elements of one kind and N — r of another, then the probability of finding x elements of the first kind when a random sample of size n is drawn is given by

The mean of x is nr IN and its variance is

When N is large and n is small compared to N, the hypergeometric distribution can be approximated by the binomial distribution.

Hyperparameter:

A parameter (or vector of parameters) 02 that indexes a family of possible prior distributions for a parameter 01 in Bayesian inference, i.e. 02 is a parameter of a distribution on parameters. An investigator needs to specify the value of 02 in order to specify the chosen prior distribution.

Hypothesis testing:

A general term for the procedure of assessing whether sample data is consistent or otherwise with statements made about the population. See also null hypothesis, alternative hypothesis, composite hypothesis, significance test, significance level, type I and type II error.