According to Heckman and Singer, ”Longitudinal data are widely and uncritically regarded as a panacea . . . The conventional wisdom in social science equates ‘longitudinal’ with ‘good’ and discussion of the issue rarely rises above that level” (1985, p. ix).
There is probably no methodological maxim in sociology more often repeated than the call for longitudinal data. From the work of David Hume more than 250 years ago, to the exhortations for a ”radical reformation” in the work of Stanley Lieberson (1985, p. xiii), the importance of longitudinal data has been emphasized and reemphasized. Yet it is doubtful that there is an area of sociological method in which more disagreement exists both as to rationale and as to method. Until relatively recently, it has been possible to ignore the problem because longitudinal data have been relatively rare, and methods for their analysis quite sparse. Since 1965, however, there has been a virtual flood of new longitudinal data and a concomitant increase in sophisticated methods of analysis. In large part the computer has made both developments possible, permitting the management of data sets of enormous complexity along with analyses undreamed of only a short while ago.
At the micro level, numerous longitudinal studies have tracked individuals over a good share of their lifetimes. For example, some participants in the Oakland and Berkeley studies of growth and development have been sporadically studied from birth until well into their seventies (Elder, this volume). On a more systematic basis, the Panel Study of Income Dynamics (PSID) has interviewed a panel based on an original sample of five thousand families (households) on an annual basis since the mid 1960s, supplementing the sample with new households formed as split-offs from the original families (Duncan and Morgan 1985). Many other large-scale panel studies, some extending over periods of thirty years and longer, are in progress (Migdal et al. 1981).
At the macro level, extended time series on various social and economic indicators such as GNP, fertility, mortality, and education are gradually becoming available in machine-readable form from virtually all industrialized societies and from many that are less developed. In some cases, data series, particularly for vital statistics, go back for decades. In other cases, such as China and the Soviet Union, modern-era data are gradually being accumulated and linked to earlier series. Descriptions of many such data sets can be found in the annual guide published by the Inter-University Consortium for Political and Social Research (ICPSR 1991).
Perhaps the most exciting developments are at the nexus of macro- and micro-level analysis. In the United States, for example, the General Social Survey (GSS) has obtained data on repeated cross-sectional samples of the population (that is, the population is followed longitudinally but specific individuals are not) on an annual basis (with two exceptions) since 1972. More recently, annual surveys modeled on the GSS have been started in a number of other countries (Smith 1986). Because of the careful replication, these surveys permit one to track aggregate responses of the changing population on a wide variety of issues (such as on attitudes toward abortion or capital punishment) over time. As the time series becomes ever longer, it is possible to carry out a multilevel analysis, linking micro and macro variables. For example, using the GSS, DiPrete and Grusky (1990) attempt to link short-term changes in rates of social mobility to macro-level changes in the U.S. economy.
Although the size and complexity of the longitudinal data base has expanded rapidly, so have the statistical tools with which to analyze it. Perhaps the most exciting development is in the area of ”event history models,” which permit one to relate an underlying rate of transition in continuous time to a set of ”covariates” or independent variables. However, event models are by no means the only development. New and powerful approaches to the analysis of means over time, to structural equation models, and to various forms of categorical data analysis have given the researchers unprecedented power. Standard computer packages now make routine what was impossible in the 1980s.
THE RATIONALE FOR LONGITUDINAL RESEARCH
Longitudinal studies are carried out in virtually every area of the social sciences. Although studies of infant development and national development share certain similarities, it is not likely that a single rationale, design, or approach to analysis will simultaneously apply to every area in which longitudinal data might be collected. At the most abstract level, there are three basic reasons for conducting a longitudinal study.
First, in any area in which development and change over time are at issue, there is, almost by definition, a necessity to obtain time-ordered data. Questions pertaining to rate and sequence of change, and to variability in rates and sequences are at the heart of longitudinal research. At one level, these questions are essentially descriptive, and getting an adequate descriptive handle on time-ordered data is an essential first step in coming to any understanding of change.
A second reason involves the role of temporal priority in causal analysis. There are few things on which philosophers of science agree when it comes to causation (see Marini and Singer 1988 for a superb review), but one is that A must precede B in time if A is to be taken as a cause of B. It is natural to assume that observing A before B means that A precedes B. Unfortunately, designs that actually allow one to establish temporal, let alone causal, priority are not as easily arrived at as one might think.
Related to the issue of temporal priority is the cross-sectional fallacy. Lieberson (1985, pp. 179183) argues that assertions of causality based on cross-sectional data must necessarily imply a longitudinal relationship. To show that city size ”leads to crime” based on cross-sectional data implies a dynamic relationship that may or may not be true. The problem is particularly acute in cross-sectional age comparisons that are necessarily confounded with cohort differences. All cross-sectional attempts to ascertain the ”effect” of age on income confound cohort differences in average income with age differences.
A third reason, particularly relevant to sociologists, is the necessity to distinguish gross change from net change. A census taken at two points in time shows changes in the distribution of a variable, say occupation, at the macro level. We might find that the proportion of the population in service-oriented occupations increases over the period of a decade. That indicator of net change conceals myriad patterns of gross change at the individual level. The count of persons in service occupations at two points in time consists of persons who were occupationally stable over the interval and persons who changed in various ways. Of course the population itself changed over the interval due to age-related changes in the labor force, migration, and differing levels of labor-force participation. All of this is masked by repeated cross-sectional samples.
Finally, although not really a ”rationale” for longitudinal research, observation plans with repeated measures on the same individuals can offer certain statistical advantages. Cook and Ware (1983) discuss these in detail.
TYPES OF LONGITUDINAL DATA
For many sociologists, the term longitudinal connotes a particular design, usually referred to as a panel study, in which individual subjects are measured repeatedly over time. A more general definition is desirable. Longitudinal data consist of information that can be ordered in time. Such data can be obtained in a variety of ways: by measuring the subject prospectively at repeated intervals, by obtaining a retrospective history in one or more interviews, from institutional records, or various combinations of these approaches. ”Strong” longitudinal data preserve exact time intervals, while ”weak” data provide sequence and order but lose interval. The distinction is parallel to that between interval and ordinal measurement.
As Featherman (1977) notes, under some circumstances, retrospective data collection may have substantial advantages of time and cost. Most surveys collect retrospective data of one kind or another, such as educational attainment, family background, and marital histories. Using structured interviewing methods in which the respondent is provided with a time-oriented matrix, it is possible to collect quite accurate information on many aspects of the life course. For example, as part of an ongoing panel, Freedman et al. (1988) obtained retrospective reports of family structure and other demographic variables that had previously been measured contemporaneously. The results are highly encouraging; when retrospective questions of this kind are asked carefully and interviewers are well trained, respondents can provide accurate and detailed information.
Of course not all variables can be measured retrospectively. Most researchers would argue that any attempt to measure past psychological states is invalid on its face. Reporting of the timing and frequency of events, even those which are quite salient, such as hospitalization, appears to suffer from serious recall problems. Subjects tend to forget them or to telescope them in time. Reports of exact earnings, hours worked, and other economic variables may suffer from similar problems. The truth is that we don’t have much information on what can and cannot be measured retrospectively. A systematic research program on these issues is becoming more and more necessary as new methods of data analysis that depend on the exact timing of events continue to evolve.
Another serious weakness of the retrospective design is that it represents the population that survives to the point of data collection and not the original population. In some situations this kind of selection bias can be quite serious—for example, in intervention studies that are subject to high levels of attrition.
Prospective studies in which a subject is measured at some baseline and then at repeated intervals are usually referred to as panel studies. Panel designs have a number of strengths along with several significant weaknesses. The primary strength, at least in principle, is accuracy of measurement and correct temporal referents. Depending on the exact design of data collection, subjects are measured at a point close enough in time to the event or status in question to permit reliable and valid assessment. Under certain circumstances temporal priority may be more firmly established.
Second, the prospective design provides greater leverage on attrition. Besides measuring a population defined at baseline, preattrition information can be used to determine the extent to which attrition is ”random” and perhaps can be used to correct estimates for selection bias. There is a trade-off, however. Frequent measurement has two potentially undesirable side effects. First, subjects may tire of repeated requests for cooperation and drop out of the study. Second, ”panel conditioning” may result in stereotypic responses to repeated questions. Thus, relative to the retrospective design, there may actually be a net decrease in data quality.
On the surface, prospective designs that extend in time are far more costly than retrospective designs. There is a clear cost/ quality trade-off, however, that cannot be easily evaluated without consideration of the purposes of the survey. In obtaining population estimates over time, the panel may actually be less expensive to maintain than resampling the population repeatedly. On the other hand, using a panel for this purpose brings problems of its own in the form of attrition and panel conditioning.
QUASI-EXPERIMENTAL AND DESCRIPTIVE APPROACHES
The large-scale, multiwave surveys so common now have rather diffuse origins. Paul Lazarsfeld introduced the panel design (Lazarsfeld and Fiske 1938). In his hands, the panel study was basically a quasi-experimental design. A panel of subjects was recruited and measured repeatedly, with the foreknowledge that a particular event would occur at a particular time. The most famous application is to election campaigns. As such, the design is a simple example of an interrupted time series.
A second source of current designs is the child development tradition. Baltes and Nesselroade (1979) cite examples going back to the late eighteenth century, but systematic studies date to the 1920s (Wall and Williams 1970). The best of these studies emphasized cohort-based sampling of newborns and systematic assessment of development at carefully determined intervals. In the tradition of experimental psychology, investigators paid attention to careful control of the measurement process, including the physical environment, the raters and observers, and the measurement instruments. The development of age-specific norms in the form of averages and variation about them was seen as a primary goal of the study. Unanticipated events, such as an illness of either mother or child or factors affecting the family, were seen as undesirable threats to the validity of measurement rather than as opportunities to assess quasi-experimental effects.
Large-scale multiwave panel studies of the kind described above combine aspects of both traditions, often in somewhat inchoate and potentially conflicting ways. On the one hand, investigators are interested in describing a population as it evolves. Often basic descriptive information on rates, variability, and sequence is unknown, calling for frequent measurement at fixed intervals. On the other hand, there is also interest in evaluating the impact of specific transitions and events, such as childbearing, retirement, and loss of spouse. Meeting these two objectives within the constraints of a single design is often difficult.
Although it might be argued that the ideal longitudinal study should take a huge sample of the population without age restriction, measure it constantly, and follow it in perpetuity with systematic supplements to the original sample, cost and logistics intervene. Although there is an infinite range of potential designs, longitudinal studies can be classified on various dimensions including (a) the consistency of the sample over time, (b) population coverage, particularly with regard to age, and (c) measurement protocols, including not only choice of variables but also timing, interval, and frequency. These factors will influence the extent to which the study can be used for descriptive purposes, relating information to a well-defined population, and/or drawing causal inferences on temporal and quasi-experimental aspects of the design.
Consistency of the Sample. The following classification is based on Duncan and Kalton (1987) and on Menard (1991).
1. Repeated Cross-Sectional Surveys. A new sample is drawn at each measurement point. The GSS, described above, is an example. This is a longitudinal study at the macro level. It describes a dynamic population.
2. Fixed-Sample Panel. A sample is drawn at baseline and measured repeatedly over time. No new subjects enter the sample after baseline. Several examples are described above. The sample refers only to the cohorts from which it was drawn. It may continue to represent them adequately if panel attrition is completely at random.
3. Dynamic Sample Panel. After baseline, subjects are added to the panel in an attempt to compensate for panel attrition and represent changes in the underlying population. The PSID is a major example.
4. Rotating Panels. A sample is drawn and interviewed for a fixed number of waves and then dropped. At each measurement point a new sample is also drawn so that samples enter and leave on a staggered basis. The best-known example is the Current Population Survey carried out by the U.S. Bureau of the Census. At any given time, the sample consists of subjects who have been in the panel from one to four months.
5. Split Panels. In addition to a basic panel survey, a coordinated cross-sectional survey is drawn at each measurement point. In effect, this is a quasi-experimental design in which comparisons between samples permit tests of panel conditioning, among other things. This design is rare.
Population Definition. The broader the population sampled, the wider the potential generalization. On the other hand, homogeneity provides greater leverage for some kinds of causal inference. The following rough classification is useful. See Campbell (1988) for elaboration.
1. Unrestricted Age Range. A sample of the entire (adult) population is selected and followed.
2. Restricted Age Range. A sample of persons in a narrow age band, such as adolescents in developing nations, is selected, with resulting homogeneity of developmental process.
3. Event-Based. A sample is selected on the basis of a particular event. Birth is the prime example; others are motherhood, school completion, business start-up, and administrative reorganization. Subjects can be members of a cohort experiencing the event at a fixed time or can be drawn on a floating baseline, as is the case when each new patient in a given clinic is followed prospectively.
4. Population at Risk. A sample is selected on the likelihood (but not the certainty) that it will experience a particular event. Although similar to an event-based sample, it is less controlled. Age-restricted samples are usually at risk for certain events, which is one reason for restricting age in the first place. An interesting example at the macro level is a sample of cities likely to experience a disaster such as an earthquake or a hurricane.
Measurement Protocols. What variables should one measure with what frequency at what time intervals? Answering such a question requires a clear appreciation of the linkage between the substantive purpose of an investigation and the mode of data analysis. For example, if one’s intent is to study labor-force participation using event history models, then frequency of measurement would be dictated primarily by (a) the frequency of change of labor-force status among subjects, (b) the amount of variability in individual rates of change, (c) the necessity to obtain exact dates of transitions for the analysis, and (d) the maximum time interval at which subjects can provide reliable and valid recall data. If the investigators have explanatory models in mind, then the measurement protocol will have to deal with similar issues for the regressors.
If one is, however, interested in the effects of widowhood, which might be studied either descriptively or as a quasi-experiment using an interrupted time series approach, very different measurement strategies would be optimal. At the very least, one would try to schedule measurements at fixed intervals preceding and following the event. If economic effects were the primary focus, annual measurement might be sufficient; but if the grief process was the focus, more frequent measurement would be required.
The more undifferentiated the sample, and the more multipurpose the study, the more difficult it is to achieve an effective measurement strategy. Many of the large-scale longitudinal studies carried out since the 1960s have one or more of the following characteristics that tend to interfere with analysis:
1. Multiple substantive foci that result in attempts to measure scores, if not hundreds, of variables.
2. Nonoptimal measurement intervals because of conflicting demands of different kinds of variables and topics. A secondary problem is that intervals are often determined by administrative and funding criteria rather than by substantive intent.
3. Measurement strategies that are chosen without regard to statistical analysis. The problem is acute for event history models that require dated transitions rather than reports of status at fixed intervals. Other examples are failure to acquire multiple indicators of constructs for LISREL models and intersubject variation in measurement intervals that interferes with growth curve estimation.
4. Weak identification of temporal sequence. This problem is most often a result of a ”snapshot” orientation to measurement in which the subject’s status is ascertained at a sequence of fixed measurement points. Knowing that at time 1 someone is married and prosperous and that at time 2 that person is single and poverty-stricken doesn’t tell us much about causal order or effect.
CAUSAL INFERENCE AND LONGITUDINAL DESIGN
As noted, the extent to which longitudinal studies allow one to establish causal effects is the subject of some controversy. Indeed, the whole issue of causal inference from nonexperimental data is extremely controversial (Berk 1988; Freedman 1991), with some taking the position that it is impossible in any circumstance other than a randomized design involving an experimentally manipulated stimulus. Although that may be an extreme position, the assumption that one can use time-ordered observational data to establish causal order is difficult to defend.
Cross-sectional analyses and retrospective designs suffer from the fact that variables which are supposed to have time-ordered causal effects are measured simultaneously. As a result, various competing explanations of observed associations, particularly contaminated measurement, cannot be ruled out. Asking about educational aspirations after completion of schooling is an example. Panel designs at least have the advantage of measuring presumed causes prior to their effects. Even in that case, establishing temporal order is not as easy as it might appear. This is particularly true when one attempts to relate intentions, attitudes, and similar cognitive variables to behaviors. Marini and Singer (1988) give an example based on education and marriage. Consider two women who leave school and get married. One may have decided to terminate her education in light of a planned marriage, and the other may have decided to marry following a decision to terminate education. The observed sequence of events tells us nothing. They note:
Because human beings can anticipate and plan for the future, much human behavior follows from goals, intentions and motives, i.e., it is teleologically determined. As a result, causal priority is established in the mind in a way that is not reflected in the temporal sequences of behavior or even in the temporal sequence of the formation of behavioral intentions. (Marini and Singer 1988, p. 377)
Because of the many varieties of longitudinal research design and the many controversies in the field, it is difficult to give hard-and-fast rules about when causal inference may be justified. Dwyer (1991), writing for epidemiologists, provides a useful way to approach the problem based on whether there is variance in presumed causes and effects at baseline.
Variation in Independent Variable
Examples include the following:
I. Follow crime-free baseline sample through possible arrest, incarceration, and later recidivism.
II. Relate aspirations of eighth graders to eventual level of educational attainment.
III. Carry out an experimental intervention, such as a job training program, that attempts to increase existing skill levels.
IV. Relate organizational characteristics at baseline to later levels of productivity.
Cases I and III can be observational or (quasi-) experimental. Cases II and IV are strictly observational. In each case, although variation may not exist in the variable of direct interest, there may be variation in closely related variables. In the experimental case, this corresponds to lack of randomization and, thus, uncontrolled variation in potential causal variables. The same is true in purely observational studies. In case II, although there is no variation in the direct outcome of interest, there is often variation in related variables at baseline. In the example given, although there is no variation in educational attainment in terms of years of schooling completed, aspirations are certainly based in part on the child’s prior academic success in the grade school environment. Case IV presents the most difficult situation because it picks up data in the middle of an ongoing causal sequence. This is precisely the situation where many researchers believe that panel data can untangle causal sequence, but neither temporal-se-quenced observations nor complex analysis is likely to do so.
To reiterate an important point about design, in large-scale longitudinal research each of these four cases is typically embedded in a complex design, and the degree of causal inference depends on many factors ranging from the timing of measurement to attrition processes.
COMMON PROBLEMS IN LONGITUDINAL DATA ANALYSIS
Those who collect and analyze longitudinal data face certain generic problems. Virtually any researcher will face some or all of the following difficulties.
Conceptualizing Change. James Coleman notes that the concept of change is ”a second order abstraction . . . based on a comparison, or difference, between two sense impressions, and, simultaneously, a comparison of the times at which the impressions occurred” (1968, pp. 428-429). It is particularly difficult to think about the causes of change, and this difficulty has been reflected in arguments about how to model it. In particular, there has been a running debate about the use of change scores, computed as a simple difference
the time 2 variable is regressed on time 1 plus other variables. In an influential paper, Bohrnstedt (1970) showed that simple gain scores tended to have very low reliability relative to either variable composing them. In light of that and other problems, Cronbach and Furby (1969) argued that the best way to model change was to treat ”residualized gain scores” as dependent variables in regression analysis. The basic equation is
where Y1 is the baseline measure and the X’s are any set of independent variables. Hence the effect of X is net of the baseline score. This method has become standard in many fields of inquiry.
More recently, a number of papers have appeared that question the use of residualized gain scores. Liker, Augustyniak, and Duncan (1985) argue that equations in which one takes differences on both sides of the model (A = a + B( AX) + e) have strong advantages, particularly when one wants to ”difference out” unchanging characteristics of subjects. Allison (1990) argues that in some cases the difference score as a dependent variable is not only acceptable but necessary. The issue is not purely statistical; it depends in large part on exactly what process one is attempting to model. Suffice it to say here that an issue which was once thought to be resolved has been reopened for serious examination.
Related to the issue of change scores and their reliability is the problem of regression toward the mean. Whenever a variable is positively correlated over time, subjects with high scores at time 1 will tend to have somewhat lower scores at time 2, while those with lower time 1 scores will tend to have higher scores at time 2. As a result, gain scores will be negatively correlated with initial values. This problem is exacerbated by random measurement error and was a primary reason for the predominance of residualized gain models in the past. Again, however, the issue is one of the underlying model. There are cases where feedback processes do indeed result in regression to the mean (Dwyer 1991) and the regression is by no means an ”artifact.” There is no question that one always needs to correct for measurement error, but regression to the mean should not necessarily be treated as an annoyance to be gotten rid of by statistical manipulation.
Lack of Independence. A standard assumption in elementary statistical models such as least squares regression is that errors are independent across observations. In the equation we typically assume cov(eiej) = 0. If the same subject is observed at two or more points in time, the independence assumption is almost certain to be violated due to omitted variables, correlated errors of measurement, or other problems. This difficulty permeates virtually all statistical approaches to longitudinal data and requires complex methods of estimation.
Articulating Analysis with Observation Plan.
Ideally, one should have in mind a model for how the underlying process of change in a variable articulates with the observation plan. In part, this requires some knowledge of causal lag or how long it takes for changes in X to result in changes in Y. The data analysis method chosen should be congruent with both the underlying process of change and the observation plan. Observation plans may obtain exact dates of transitions (when did you quit your job?), locate the subject’s state at time t (are you currently working?), or obtain retrospective information relative to some other time point (were you working as of June 30?). Studies that have multiple goals are difficult to optimize in this respect.
Construct Validity. In long-term longitudinal studies, the validity of constructs may degrade over time, sometimes through changes in the subjects themselves, sometimes through measurement problems such as conditioning, and sometimes through changes in the environment. For example, the meaning of terms used in political research such as liberal may change with time, or references to specific issues such as abortion may become more or less salient. In growth studies, subjects may ”age out” of certain variables. This issue is quite understudied, at least by sociologists.
Measurement Error. Measurement error is a serious problem for statistical analysis, particularly regression, because it results in biased estimates.
The problem is particularly acute when independent variables are subject to error. In the longitudinal case, errors of measurement tend to be correlated across subjects over time, causing even greater difficulty. A major reason for the popularity of structural equation packages like LISREL (Joreskog and Sorbom 1988; Bollen 1989) is that they permit one to model correlated error structures. This is not to say that LISREL is an all-purpose solution; indeed, its error-handling capabilities probably lead to its use in situations where other approaches would be better.
Time-varying Independent Variables. In a typical situation one is faced with a set of fixed independent variables such as age, sex, and race, and a set of variables whose values may change over the course of a study. In studies of income, for example, educational levels and family structure may change between observations. Although work on this problem has been going on in economics for some years (Hsiao 1986), sociologists have been slow to respond. The problem always leads to great statistical and computational complexity.
Missing Data, Attrition, Censoring, and Selection Bias. Attrition in panel studies is inevitable. In rare cases attrition is completely random, but more commonly it is associated with other variables. Standard listwise and pairwise missing data ”solutions” are rarely appropriate in this situation. Three types of solutions are available. First, one can develop weights to correct for attrition. Second, one can model the attrition process directly, using ”selection bias” models (Heckman 1979; Berk 1983). Finally, one can impute estimates of missing data (Little and Rubin 1987). This entire area is controversial and under rapid development. A special case of attrition occurs when observations are censored in such a way that measurement stops before a transition of interest, such as ending a panel study of fertility before all subjects have completed childbearing.
APPROACHES TO DATA ANALYSIS
The period since 1975 or 1980 has seen enormous increases in the power and sophistication of longitudinal data analysis. This material is much too complex to cover in this brief review. A more reasonable goal is to provide a few examples of the kinds of substantive questions that one can ask with longitudinal data and appropriate methods for answering them.
Outcomes at Fixed Points in Time. Example: ”predicting” a person’s savings at age sixty-five. This is the simplest longitudinal question; indeed, in one sense it is not longitudinal at all. The dependent variable involves change over time very indirectly, and the independent variables usually involve life cycle variables like the nature of labor-force participation. These data can often be collected retrospectively, and the only advantage of a repeated measures approach is control over measurement error.
It is not uncommon to see models for fixed outcomes at arbitrary time points—for example, level of education attainment as of the last available wave of an ongoing panel study. This is a particular problem when the sample consists of subjects who vary in age or is not defined on the basis of some reasonable baseline. When the outcome is a discrete transition, such as marriage, censoring is often a problem.
Means over Time. Example: Comparison of aspiration levels of boys and girls over the middle school years. With a sequence of independent samples, this is a straightforward analysis-of-vari-ance problem. If the same individuals are measured repeatedly, a number of problems arise. First, the observations are not independent over time; we have to assume that the usual statistical assumption of independence in the error terms will be violated. Within the ANOVA tradition, this problem can be approached via multivariate analysis of variance or as a classic univariate design. Hertzog and Rovine (1985) compare the two approaches. Sociologists have tended to ignore mean comparisons, preferring to work with structural equations; however, analyses of means are often far more direct and informative. See Fox (1984) for an interesting application to an interrupted time series.
Classic ANOVA ignores individual-level heterogeneity. Over the course of five observation points, students vary about the group means. The description of change at the individual level rather than at the group level may be of some interest. A simple approach is to use ”blocking variables” such as race to account for additional heterogeneity. Adding covariates to the model is another. A more sophisticated approach is to model the individual-level growth curve directly. Conceptually, analysis begins by estimating an equation to describe each respondent’s score with respect to time, often using a polynomial model. The coefficients of these equations are then treated as dependent variables in a second-stage analysis. There are a number of different statistical approaches to this kind of analysis. Rogosa (1988) has argued forcefully in favor of approaching longitudinal analysis in this way. Dwyer (1991) provides an interesting example. McCardle and Aber (1990) deal with such models in a LISREL context.
Structural Equation Systems. Figure 1, based on Blau and Duncan’s classic path model of occupational mobility, is a longitudinal analysis, although it is not often thought of in those terms. For our purposes, the important point is that the exact timing of variables was not assumed to be of great import, although the sequence was. Education was taken to intervene between measures of family background and occupational attainment, and its timing was not specified. The latter variable was assumed to reach some plateau relatively early in the occupational career; again, timing was not important. Models of this kind, with rather vague time referents and in which it is assumed that the order of events does not vary across subjects, have played an important role in sociology for some time, adding great clarity to the notion of intervening variables. It is natural to assume that structural equation models are a natural way to analyze multiwave panel data.
Kessler and Greenberg (1981) deal at length with the application of structural equation models to panel data, particularly to cross-lagged structures where both variables vary at baseline. They show that attempting to estimate relative causal effects must be handled in the context of a formal model rather than by comparing cross-lagged partial correlations. Figure 2, based on Campbell and Mutran (1982), is a multiwave panel model with intervening variables. Here, a number of statistical and conceptual difficulties become obvious. First, what is the lag time for the effect of health on income satisfaction and vice versa? Second, how does one deal with the potentially complex error structure of observed variables? Third, if illnesses intervene (as measured by times in hospital), how does the timing of events relative to the timing of observations affect the results? Fourth, how does one deal with discrete intervening variables that fail to meet standard distributional assumptions? Finally, how does one handle correlated errors in equations over time? These are by no means the only questions that could be raised, and it is not clear that models of this kind are the most effective way to approach multiwave panel data.
note: A Basic Model of Attainment
Event History Models. This class of models treats the underlying rate of change in a discrete variable as the dependent variable in a regression like framework. Typically, the dependent variable is conceived of as the instantaneous risk of transition at time t conditional on prior history. The formulation is dynamic rather than static, and time enters the analysis explicitly. Allison (this volume) discusses a number of such models in detail. At first glance, the model may seem mathematically esoteric, but the basic ideas are straightforward. The underlying concepts are closely related to the life table and to Markov models. The Markov model, as applied in the classic labor-force studies of Blumen, Kogen, and McCarthy (1955), assumes a transition process that is constant over time and invariant across subjects. Event history models allow one to relax both assumptions, specifying a time-dependent rate of change and a set of individual-level ”covariates” that allow transition processes to vary with characteristics of the subjects. Covariates can be allowed to change with time as well, at the cost of substantial computational complexity.
note: A Model for Health and Income Satisfaction
In one of the first applications in sociology, Tuma, Hannon, and Groeneveld (1979) showed that the rate of marital dissolution among subjects in an income-maintenance experiment depended on the treatment condition, thus demonstrating that event models could be used for causal analysis. Event models are inherently dynamic and take the timing of transitions explicitly into account. They are having enormous impact not only on how sociologists analyze data but also on how they conceptualize problems and design data collection.
Differential Equation Models. Coleman (1964, 1968, 1981), among others, has argued that the appropriate way to study change in continuous variables is via differential equation models. A simple differential equation relating Y to its own level and to an exogenous variable takes the form
where dY/dt is the rate of change in Y with respect to time, that is, the derivative. The rate of change is not directly observed, of course; and to estimate the coefficients of the model, it is necessary to integrate the equation. Coleman (1968) shows that for this particular model, it is possible to obtain estimates of a, b1, and b2 by first estimating a regression equation of the form
using ordinary regression and then transforming the regression coefficients to obtain the coefficients of the differential equation model. In this particular case the coefficients of the differential equation model are
The resulting coefficients describe the time path of Y as a function of its initial value and the value of X. Note that this model assumes that X is not changing over time. One implication of this model is that if one wishes to assume that Y at time t does not depend on its initial value, then residualized gain score models of the kind described above are inappropriate.
Applications of differential equation models in sociology have been relatively rare, although they would seem to be a natural way to approach the analysis of change. Many of the seemingly endless arguments about the representation of change in regression models stems from the application of static methods to dynamic processes. The difficulty is that it is not easy or always possible to transform the differential equations in such a way that they can be estimated by simple regression. Doing so requires considerable mathematical sophistication on the part of the researcher. Tuma and Hannon (1984) discuss these and related models at length. Arminger (1986) provides an example of how to recast a standard three-wave LISREL-based panel analysis into a differential equation model.
It is no accident that words like controversial and under discussion recur frequently in this review. Many important issues regarding the collection, analysis, and interpretation of longitudinal data remain to be resolved. Frequently, what are in fact conceptual issues are argued in statistical terms. The literature is technical and mathematically demanding. But important progress is being made and will continue as new methods of research continue to emerge.