Standardization is a technique used in comparing indicators from two or more populations. The goal of the standardization procedure is to control for compositional differences between these groups that may influence the indicator that is being examined. This method allows a researcher to determine the extent to which differences in the rates of events between populations are due to differences in population characteristics. Often sociologists ask questions, that require comparisons between groups of people: Which city has a higher crime rate? Which country has lower mortality? Which ethnic group is more likely to coreside with elderly family members? In making these comparisons, one usually calculates a summary measure: crimes per capita, crude death rate, or the proportion of elders living with family members. However, any two groups of people are likely to differ along several dimensions, such as age, educational level, race, and income. These dimensions, or factors, also may be related to the event being explored. As a result, the summary measure to some extent reflects the compositional differences in the groups being studied.
Standardization historically has been a central aspect of demographic methods (Bogue 1969; Hinde 1998; Murdock and Ellis 1991; Shryock and Siegel 1980), but its importance extends beyond that use to a way of thinking about summary or aggregate measures. While offering the advantage of conciseness, aggregate measures mask underlying compositional differences, and the use of standardization represents an acknowledgment that population characteristics influence the rate at which events occur in a population. Summary indicators are very useful; they provide a single number for comparison rather than a whole series of numbers, and they are easily calculated. However, comparisons among population groups or among subgroups in a population should account for the differing compositional makeup of those groups. Demographers have been led to standardization for several reasons. First, there is a natural desire to make comparisons between groups along demographic indicators: crude death rates, crude birthrates, marriage rates, and employment, among others. Standardization allows these comparisons to reflect differences in the underlying processes, rather than being confounded by the effects of composition. Standardization procedures can accommodate the effects of a single factor or many factors, leaving the technique bounded only by the available data. Standardization also allows the estimation of indicators for groups for which data are incomplete or of poor quality.
Many demographic measures are affected by the composition of the population, particularly the age distribution. Age composition is especially critical in considering crude death rates, since mortality rates have a very distinctive age-specific pattern: high at very young and very old ages.
Populations with a large proportion of persons in those age groups experience a large number of deaths, regardless of age-specific rates of mortality. Two populations with identical sets of age-specific rates of mortality but different age distributions will have different crude death rates. The removal of the ”interference” of age distribution from the summary measure—the crude death rate—is the goal of the standardization procedure. In the rest of this article, the standardization procedure will be explained using mortality rates, and then several other examples of standardization will be presented.
The first step in a comparison is to calculate a crude rate or proportion. Crude rates or proportions are calculated by the formula
where E refers to the number of events of interest in the population during the time period and P refers to the population during that period. If the population is measured at the middle of the year and the events occur throughout the year, this proportion can be interpreted as a rate. In cases where this proportion is small, for instance, mortality rates, the crude rate commonly is multiplied by 1,000 and reported as the number of events per 1,000 people.
Crude rates or proportions are used to represent a variety of characteristics of a population. These rates have an advantage over a comparison of absolute numbers, since they account for differences in size between two populations. Obviously, in a comparison of the annual number of homicides in Chicago versus that in Seattle, one must account for the fact that the population of Chicago is 2.8 million people compared to about one-half million in Seattle. Similarly, comparing the number of deaths in the United States (over 2 million) to those in Sweden (about 90,000) in 1994 would be unreasonable without knowing that the population of the United States is three times that of Sweden.
Despite the advantage of crude rates over absolute numbers, crude rates are influenced by the composition of the populations being compared. If the event of interest varies by some factor and the two populations have varying levels of that factor, the crude rates will partly reflect this compositional variation rather than only a difference in the rate at which the event is occurring. If the populations being compared are standardized with respect to the factor, any remaining difference between the crude rates can be attributed to a true difference in rates of occurrence. If the difference in the crude rate disappears, one can conclude that the compositional variation rather than a difference in the underlying rates of occurrence led to a difference in the crude of events.
To understand the rationale of standardization, it is necessary to recognize that in essence, the crude rate is a weighted average of a set of factor-specific rates, where the weights are the distribution of the factor in the population. Thinking in this manner, one can rewrite the crude rate as
where pa is the population in group a and ea is the number of events occurring in group a. The sum of all ea equals the total number of events, E, and the sum of all pa equals the total population, P. Note that this equation has two components. The first, ea/pa, represents the group-specific rate of events or the group-specific proportion, which sometimes is expressed as ma. The second component of the rate calculation, pa/P, represents the proportion of the population in each of the a groups. These are the two series of elements needed to apply the direct standardization technique. Using this notation, the crude rate can be rewritten as
When the formula for the crude rate is written in this manner, it is easy to see how the composition of the population, that is, its distribution among the a groups, affects the crude rate. If the group-specific rate ma is high when the proportion of the population in that group, pa/P, is large, more events will be observed in the total population than will be observed if pa/P is small. Similarly, if ma is small when pa/P is small, few events will occur.
A comparison of the crude death rates in Sweden and the United States provides an example of the use of standardization. Sweden has one of the world’s highest life expectancies at birth, approximately 76 years for men and 81.4 years for women in 1994. The crude death rate of Sweden, however, was about 10.4 deaths per 1,000 in that year. In contrast, life expectancy at birth in the United States was 72.2 years for mens and 78.8 years for women in 1993, and the crude death rate was about 8.6 deaths per 1,000 in that year (United Nations 1997). It seems natural to expect that the country with the longest life expectancy would also have the lowest crude death rate, so what accounts for this discrepancy? To understand the reason for this difference in the crude rates, it is necessary to observe the differing age distributions of the two populations. In the United States about 13 percent of the population is over age of 65; while in Sweden over 17 percent of people are over that age. Since death rates are highest in this age range, the larger proportion of the Swedish population in old age creates more deaths, even with lower age-specific death rates. Standardization demonstrates the extent to which these differences in age distribution account for the difference in the crude death rate.
As was mentioned above, this method of standardization—direct standardization—requires a standard population distribution and a set of factor-specific rates for the populations being studied. Direct standardization uses this standard population to calculate new standardized crude rates for the populations of interest. In this case, the population distribution of the standard population replaces the observed population distribution. Since each population’s crude rate will be calculated with the same distribution, the effect of the compositional differences will be eliminated and each population will have the same composition. To apply direct standardization, the formula
is used, where eja represents the number of events occurring in group a in population j, pja represents the population size of group a in population j, psa represents the number of people in group a in the standard population s, and P represents the standard population. Comparing equations (2) and (4) shows the similarities. The second term in equation (2), the compositional distribution of the population of interest, pa/P, has been replaced with the compositional distribution of the standard population, psa/Ps. The first term in the crude rate calculation remains the factor-specific rate in the population of interest, population j.
Returning to the example of the United States and Sweden, using the age distribution of the United States as the standard distribution and computing a standardized crude death rate for Sweden by applying the age-specific death rates of Sweden yields a standardized crude death rate of 7.6 deaths per 1,000 for Sweden. Instead of being higher than the crude death rate in the United States, Sweden’s crude death rate falls below that of the United States. At least part of the difference in the crude rates therefore is due to Sweden’s older population rather than to a difference in age-specific death rates. In general, populations with a relatively old age distribution tend to have higher crude death rates than do populations with similar age-specific mortality patterns, since death rates are higher at older ages.
The data demands for direct standardization, while not overwhelming, can be difficult to meet if there is limited information on factor-specific rates in one of the populations of interest. For example, in many studies of mortality in less developed countries or in a historical perspective, information on age-specific death rates may be missing or unreliable. In these cases, an alternative method referred to as indirect standardization can be used. Indirect standardization requires knowledge only of the composition of the population and the total number of events of interest. Direct standardization involves the application of population-specific sets of rates to a standard population; conversely, indirect standardization involves the application of a standard set of rates to individual population distributions. In indirect standardization, a set of standard rates is applied to the population and the expected number of events is compared to the actual number. This standardizing ratio is estimated by the formula
where Ej is the actual number of events in the population j, msa is the factor-specific rate in the standard population s, and pja is the number of people in population j who are in group a. The denominator of the ratio calculates the number of events that would be expected in population j if the factor-specific rates of the standard population were applied to the population. When the event of interest is death, this ratio often is referred to as the standardized mortality ratio. To obtain the new indirectly standardized crude rate, this standardizing ratio is multiplied by the crude rate for the standard population:
where CRs is the crude rate in the standard population. These indirectly standardized crude rates then can be compared to each other. Obviously, when the standardizing ratio is greater than 1.0, the ISR will be larger than the crude rate for the standard population, and when the standardizing ratio is less than 1.0, the ISR will be smaller than the standard population’s crude rate.
Indirect standardization does not control for composition as well as the direct standardization method does but should yield similar results in terms of direction and magnitude. Returning to the example of Sweden and the United States, the actual number of recorded deaths in Sweden would be greater than the observed number if U.S. age-specific death rates were applied to the Swedish population’s age distribution. The resulting standardized mortality ratio would be 0.912, and when that was multiplied by the crude rate for the United States, the ISR for Sweden would be 7.8, very similar to the result obtained through direct standardization.
When indirect standardization is employed, there is no choice to be made about the standard population; this method is used when only one population distribution is available. The choice of the standard population for direct standardization should be considered carefully, but within reasonable bounds the choice of standard should not alter the conclusions radically. Researchers generally are interested in the direction and approximate size of differences between the groups, and these values are preserved with the choice of any of a number of reasonable standard populations. There are three general choices for the standard: use one of the populations being studied, use an average of the populations, or use a population outside those being studied. Each of these choices has advantages and disadvantages. Theoretically, the choice of standard should be made to minimize the effects of that choice on the results.
Using one of the populations being studied eliminates the need to standardize that population and often makes the explication of comparisons easier. For instance, in comparing crime rates across several cities, choosing one city as the basis for comparison may be appropriate. When comparisons are made of a population over time, it is standard procedure to choose a distribution that is representative of the middle of the time period. For instance, in a study of mortality change between 1950 and 1990 in the United States, it would be appropriate to use the 1970 census for the standard age distribution. A drawback to using one of the study populations as the standard, however, can be that the population chosen has an unusual distribution of factors. This unusual distribution may skew the summary measures in a way that is inconsistent or difficult to interpret. Also, choosing one of the populations as a standard can carry implications that this distribution is the ”ideal” or ”correct” distribution and may place interpretational burdens on the results.
Using an average of the populations eliminates the problem of setting one population as the ideal and ameliorates the problem of unusual distributions. A comparison of racial differences in mortality in the United States, for example, might use the age distribution of the total U.S. population, an unweighted average of the distribution of each racial group, as the standard. This choice eliminates the assumption that any one population has a preferred distribution and allows for meaningful comparisons among groups. The use of an aggregate population as the standard is encountered frequently in comparisons of subgroups within a national population.
A third choice is to pick a population completely exogenous to the study as a standard. This choice most often involves an artificial population that is representative of a standard pattern of factor distributions. Several sources of standard populations exist. In the case of age, Coale and Demeny’s (1983) set of regional model life tables contains sets of age distributions typical of a variety of mortality levels and patterns. The use of an external standard eliminates any value judgments associated with the choice of standard. An external standard also can be chosen to minimize or eliminate extreme distributions of factors. The external standard also provides a way of comparing very diverse populations. Again, the choice of standard should match the populations being studied as closely as possible to minimize the effect of that choice on the results.
An exogenous standard also might be employed as a way to simulate the effects of a variety of changes in population composition on the crude rate. This use of the standardization technique highlights the underlying logic of the procedure by using the method to investigate the extent to which compositional chances influence aggregate comparisons. Here the technique is used as a methodological device to explore the effects of changes. For instance, a researcher might be interested in the effects on average wages of changing occupational structures among men and women. A testable hypothesis could be that as women approach men in terms of occupational distribution, the gender gap in wages will disappear. If a variety of simulated occupational structures are applied to a set of gender- and occupation-specific wage rates, the effect of occupational structure on the wage gap can be examined.
Since standardization developed in the field of demography, most applications involve the study of demographic phenomena. The example of the United States and Sweden involved comparisons of mortality rates. However, standardization is used widely in other areas as well. For example, the U.S. Census Bureau routinely reports the distribution of the American population aged 15 and older among marital states, and historical comparisons of this distribution are used to examine changes in marital behavior over time. However, the age composition of the population can greatly influence the distribution among marital states, particularly when the proportion of the population in the age range of 15 to 25 years is very large. In 1960, 65.6 percent of women aged 15 and older were married compared to 60.4 percent of similarly aged women in 1975 (United States Bureau of the Census 1976). At first glance, these comparisons seem to signal a retreat from marriage: A smaller proportion of women was married in 1975 than in 1960. However, when the age distribution of the population is standardized to the 1960 population, the proportion married in 1975 increases to 63.5. While this is still a decline compared to 1960, the magnitude of the change is much less. The difference in the proportion married is due largely to a difference between 1960 and 1975 in the proportion of women just over the age of 15, the baby boomers, who were young teenage women who had not yet married.
Standardization can be used to control for characteristics other than age. Suppose, for instance, one is comparing the health status of two different groups: elderly white Americans and elderly African-Americans. If we compare the proportion of each group in poor health, we find that 34 percent of elderly whites and 50 percent of elderly African-Americans report their health as fair or poor. However, we know that health status varies by education and that the educational distributions of these two groups differ. Among elderly whites, about 12 percent have fewer than eight years of school, compared to 39 percent of elderly African-Americans. Clearly, since lower levels of education are associated with poorer health and elderly African-Americans have lower levels of educational attainment, some of the difference in observed health status between the groups can be expected to result from the different educational compositions.
It is desirable to compare these two groups without the influence of education. Using the educational distribution of the elderly white population as a standard and applying the observed education-specific rates of poor health among elderly African-Americans, one obtains an overall proportion of 42 percent in poor health, compared to the unstandardized proportion of 50 percent. Thus, if the African-American older population had an educational distribution similar to that of the more highly educated white elderly population, the expected health status of older African-Americans would improve.
Lichter and Eggebeen (1994) used standardization techniques to examine the effects of parental employment on rates of child poverty. In their work these researchers use direct standardization techniques in two different ways. In the first, they simulate the effects of a variety of assumptions about parental employment patterns on children’s poverty rates. This is an illustration of using an ”exogenous” or artificial population distribution as a standard population. By changing the employment distribution of the parents of children in poverty, they determine that only modest declines in child poverty would result from increasing those levels of employment. Their second application of standardization compares the poverty rates of black children obtained by using the employment distribution of white parents as the standard to the rates directly observed. In this case, they have chosen one of the study populations as the standard and are interested in the extent to which differences in child poverty between blacks and whites are determined by factors other than parental employment distributions. They find in fact that parental employment differences among female-headed families account for a substantial portion of the observed differences in child poverty.
Standardization can control for more than one factor at a time and can be applied to more than two groups. Himes et al. (1996) standardize for age, sex, and marital status in an examination of the living arrangements of minority elderly in the United States. Living arrangements are known to be different for men and women, for married and unmarried, and for younger and older elderly. These factors—age, sex, and marital status—also are known to vary across racial and ethnic subgroups. Therefore, the observed differences in living arrangements are likely to be due in part to these underlying characteristics rather than being a reflection of differences in attitudes or beliefs. Standardization allows a comparison among groups without the influence of these compositional differences. In this research, the compositional distribution of the entire United States with respect to age, sex, and marital status was chosen as the standard. In this analysis, the standardization procedure had the greatest effect on comparisons of the African-American population and much smaller effects on the white, non-Hispanic, Hispanic, Asian, and Native American populations.
Standardization is widely used in a variety of sociological inquiries. While it originated in demographic analyses, it can be applied to a variety of questions in which a researcher wants to determine the extent to which compositional differences in population groups account for observed differences in summary measures. Standardization is also useful as a simulation technique, allowing researchers to explore the effects of a variety of compositional changes on a summary indicator. Researchers should bear in mind, however, that the results of standardization are merely artificially constructed indicators; they do not represent a real population or circumstance.