Growth curve analysis (Social Science)

Growth curve analysis, or trajectory analysis, is a specialized set of techniques for modeling change over time. The time frame might be seconds in a psychophysiology study, or years or even decades in a longitudinal panel study.

Growth curve analysis is a data reduction technique: it is used to summarize longitudinal data into a smooth curve defined by relatively few parameters for descriptive purposes or further inquiry. It is especially useful in developmental psychology research, measuring intraindividual change over childhood years or life course, but it can also be used to measure change at group levels in sociology or demography. Discussion in this entry will be in terms of intraindividual change, for ease of communication, but the principles apply to other levels of change.

Before the advent of growth curve analysis, longitudinal analysis usually focused on time point-to-time point relations, or even cross-sectional relations involving different cohorts of participants (and often still does, of course, depending on the nature of the data collection and research question). With only two time points of data collection, longitudinal analysis is largely limited to correlation or regression (either predicting an outcome from earlier variables, including the earlier score on the outcome, or predicting a change score). If the researcher has data from at least three measurement occasions, growth-curve analysis becomes a useful tool.

The core idea of growth curve analysis in longitudinal data is that the researcher can estimate a best-fit line or curve to each individual’s responses over time. In subsequent analysis steps, the parameters defining those curves can be analyzed. To take an example of the simplest case, consider reading-competency tests administered at ages six, seven, and eight. With three measurement occasions, growth curve analysis involves estimating a best-fit line (and a residual or error component). The line for each respondent is characterized by an intercept or overall level and slope or linear change over time. In this example, the intercept would typically be set at age six, with the slope representing the rate of change through age eight, as shown in Figure 1 and discussed further below. The means of these intercepts and slopes represent a sample-average trajectory, and the variances of those parameters across individuals represent the variability in the growth curve (line). The individual variation in growth curves can then be predicted from respondent-level variables (e.g., gender, socioeconomic status, treatment condition). If the data collection includes at least four measurement occasions, more complex growth curves can be estimated—for example, with a relation with a quadratic component of time (time squared) in order to model curvature, or acceleration.


There are two primary analysis techniques for growth curve analysis: multilevel modeling (MLM) and structural equation modeling (SEM). Multilevel models, also known as hierarchical linear models, mixed models, and other terms, are based on the disaggregation of the model into multiple levels of explanation. For growth-curve analysis, this typically involves a within-person level and a between-person level. The outcome is measured at a within-person level, across multiple occasions. The outcome is directly predicted by an intercept and a time variable or variables. It can also be predicted by time-varying covariates (e.g., teacher qualification) that the investigator wishes to separate from the growth curve. The regression coefficients associated with some or all of these variables are, in turn, predicted by variables at the between-person level. In the example above, there is a person-level equation for each child—a best-fit regression line fitted to the three points. The intercept and slope of this line vary between children. This variance can be predicted by child gender or other between-person predictors.

Figure 1

Figure 1

Structural equation models are used in the specific case of growth-curve models known as latent trajectory analysis. In this instance, the growth parameters (intercept, slope, etc.) are modeled as latent variables that have the individual measurement occasions as indicators. The loadings of the indicators on the growth variables are typically fixed as functions of time, as shown in Figure 1. Reading ability at age six is a function only of the intercept (the value of the intercept for that child multiplied by the loading of 1). Age-seven reading scores reflect the intercept (again multiplied by 1) plus the slope multiplied by 1; at age eight the slope value is multiplied by 2. Thus the slope is the estimated rate of improvement from one year to the next. Finally, as in multilevel models, the latent growth variables can be predicted by other variables in the model.

It is possible to estimate the same models in SEM (modeling means and intercepts, in addition to variances and covariances) as in MLM. In the most common applications, the only differences are in the handling of the occasion-specific residuals, and those differences can be resolved. MLM has the advantage that certain statistics are commonly output that can be useful, such as the apportionment of variance between the two levels. It is also possible to create elaborate nesting structures (e.g., occasion within person, person within classroom, classroom within school, etc.). SEM approaches have the advantage of greater flexibility in the modeling approach. For example, it is possible to model the slopes as dependent on the intercepts, and it is relatively simple to incorporate a second or even third growth process in the same model. Also, characteristic of SEM, the indicators of the growth curve can themselves be latent variables, measured by observed indicators taken over time.

In either approach, it is not necessary that all respondents have the same occasions of measurement. As the key variables are the growth parameters, it is not critical that the growth parameters be estimated for different respondents in exactly the same way—if a child’s reading is measured at age 7.5, the slope value would merely have a loading (multiplier) of 1.5 to characterize the time elapsed since the intercept. As a consequence, even in a fixed-occasion design, the methods easily accommodate missing data collection points, insofar as the missingness is at random (MAR). The consequence of fewer measurement points for an individual is simply that that individual’s data contribute less to the solution of the model.


Much can be done on the basis of a growth curve analysis. For example, more sophisticated models of change can be estimated, especially in the SEM latent growth framework. The growth curve, while a longitudinal model, is essentially a time-invariant estimate of a respondent’s change over time—the curve parameters are constant. Typically, of course, there will be deviations from that curve, because it is, by necessity, a simplification of the data (three data points can rarely be exactly defined by two parameters). One way to model that deviation is by the autoregressive latent trajectory (ALT) presented by Kenneth Bollen and Patrick Curran (2003). In this model, the time-invariant elements of change (intercept and slope) are modeled by the growth curve, while time-specific deviations from the curve (the extent and direction in which measured values differ from predicted values) are predicted in an autoregressive model, where each timepoint is linearly predicted by previous points, so that deviations propagate through time.

Hypotheses about the interrelations over time between two variables are common in developmental research. Historically, these have been typically studied in an autoregressive cross-lagged path analysis model (i.e., a model in which each variable at each timepoint is affected by the previous measurements of both the same variable and the other variable). This can be extended with growth curve analysis to a bivariate ALT, where the relations among the growth parameters of each process are modeled according to hypothesis, and the time-specific change in one variable is influenced by the other variable as well.


Two major applications of growth curve analysis beyond the descriptive aspects discussed so far are in analysis of intervention effects and in estimation of predictive models. Continuing the example, in a growth curve in an intervention study to improve reading ability, it is possible to estimate a curve with one intercept and two slopes (or higher-order curves), in which the value of the second slope is constrained to zero for each child in the control condition, and is added to the model for children in the treatment condition beginning with the timepoint after initiation of the intervention. In this case, the first slope represents the baseline growth, and the second slope represents the deviation from the baseline trajectory induced by the intervention.

A second interesting application is the development of predictive models. If the growth curve is well established in observed data, it is possible to project the curve beyond the observed data and make reasonable predictions of future events. Of course, this includes the risk common to any regression model in which prediction is extended beyond the observed range of predictors, but applied with caution, it can be a highly useful tool.

Next post:

Previous post: