Longitudinal studies are investigations that study the same units (individual, households, communities, etc.) over time, with repeated waves of data gathering. They may be small-scale studies of a particular group, large-scale surveys of samples of national populations, or large-scale social experiments.
DISTINCTIONS FROM OTHER TYPES OF STUDIES
In contrast to longitudinal studies, cross-sectional studies collect data at one point in time; repeated cross-sectional studies collect data from comparable units at successive time points. This distinction, however, depends on the unit of inference. For example, in the 1970s the researchers Howard M. Bahr, Theodore Caplow, and Bruce A. Chadwick replicated the 1924 Middletown study in the form of a repeated cross-section, meaning they asked the same questions as in 1924 of a new random sample of the town’s inhabitants. But inferences about the social structure and prevailing attitudes of the community were longitudinal.
Both panel studies and cohort studies trace units through time, with the distinction that a cohort study chooses units experiencing an event in the same time period (e.g., a birth or age cohort as in the National Longitudinal Survey of Youth [NLSY]), whereas a panel study uses a sample, random or otherwise, chosen without respect to a defining event (e.g., the Panel Study of Income Dynamics [PSID]).
Collecting data longitudinally confers several advantages. Such data are necessary for studying individual processes, such as of aging, of labor force transitions, of family formation and change. Longitudinal data are especially crucial when the research attempts to establish causality. Only with longitudinal data can the criterion of temporal priority be established unambiguously. In cross-sectional data one can sometimes infer temporality, but often one must trust memory, which is very unreliable. (This criticism applies equally to what are sometimes called longitudinal designs but which collect data retrospectively. The recall of attitudes and opinions is so influenced by the current state of those variables as to make recall an exercise in creative imagination.) In addition, if the research purpose is to study "flows" (e.g., the number of people who move into or out of unemployment during a month) rather than "stocks" (e.g., the number of people unemployed on a given date), then longitudinal data are absolutely required. The study of flows employs a turnover table, as illustrated by Bernard Levenson’s 1978 article.
Although temporal priority can be established via a longitudinal study, without a true experiment involving randomization and manipulation of an independent variable, causality cannot be proved. Large-scale social experiments, however, such as the negative income tax experiments of the 1960s and 1970s, involve longitudinal data collection and hence partake of all the advantages and disadvantages of longitudinal studies.
SPECIAL ISSUES IN LONGITUDINAL STUDIES
When investigators analyze data from longitudinal studies in the aggregate, they seek to separate age effects (caused, for example, by physical aging or moving to different age-related roles) from period effects (caused, for example, by changing social environments) and from cohort effects (caused, for example, by different social environments at critical stages in individuals’ development). Work by Erdman Palmore, as presented in his 1978 article, points out that longitudinal differences (between measurements of the same cohort taken at two time periods) are the sum of age and period effects; cross-sectional differences (between measurements of different cohorts taken at the same time period) are the sum of age and cohort effects; and time lag differences (between measurements of an older cohort taken at time one and those on a younger cohort taken at time two, when they have achieved the same age the older cohort was at time one) are the difference between period and cohort effects. Disentangling these effects requires careful thought and analysis, especially if only a single cohort is being followed over time.
Because measurements on the same unit at different time points are correlated, longitudinal data rely on special statistical techniques that take such correlation into account. Analogous to analysis of variance models appropriate for repeated measures designs, these techniques employ certain forms of generalized linear models.
Dorothy D. Dunlop’s work from 1994 and that of James H. Ware from 1985 are explications of these statistical techniques.
The problem of attrition haunts longitudinal data collection. Researchers lose the opportunity to conduct follow-up when people move out of the study area, refuse to continue, or die. Investigators do their utmost to prevent attrition; for example, the PSID sends postcards to its respondents yearly to determine their whereabouts. Attrition degrades data from a longitudinal study to the extent that data from those not participating in later rounds differ from the data of those participating. Imputation of missing data is, in theory at least, easier for a longitudinal study than for a cross-sectional one, as data from earlier rounds of the study can aid in the effort; Roderick Little’s 1988 work offers examples of such techniques.
To avoid missing data, some longitudinal studies (e.g., NLSY) use a life-history method, asking respondents to report not only their current status on such variables as employment and marital status, but also to report and date any changes in these statuses since the previous interview. Thus if a respondent misses one or more interviews, longitudinal data for that respondent are nevertheless available from these retrospective reports, although possibly at some cost in accuracy if respondents’ memories are faulty.
When a longitudinal study is concerned with units other than individuals, additional complications arise. For example, PSID, which studies income dynamics for families, sampled approximately 4,800 families in 1968. Because PSID follows all members of these original families as they leave (children marrying, original couples divorcing, etc.), by 1996 the study encompassed about 8,500 families. To keep the sample representative of the general population, about 440 immigrant families were added in 1996 and almost 1,500 families had to be dropped.
When data are collected repeatedly from the same unit, those data present an increasingly detailed and thus increasingly identifiable picture of the study subjects. Thus issues of confidentiality and data security become increasingly important and increasingly difficult.
In addition to studies designed longitudinally to study change, several large-scale government surveys (e.g., the Current Population Survey [CPS] and the National Crime Victimization Survey [NCVS]) use rotating panel designs in which a unit participates in the survey for several reference periods. This design makes contacting respondents easier than if a new sample were drawn for every wave and increases the efficiency of statistical estimation. It is difficult for the government agencies collecting these data to link respondents across waves in order to study them longitudinally; nevertheless, such linked files have been produced and used fruitfully in both methodological and substantive research.