To understand the characteristics and uses of longitudinal data, it is first important to understand what cross-sectional data are. Social science researchers have used cross-sectional data, which are drawn from surveys of samples of individuals or groups (aggregate data), to investigate the differences or differentiations among the individuals or groups. Unlike longitudinal data, cross-sectional data collect information only one time. As with cross-sectional data, longitudinal data may be used to understand differences among individuals. If repeated, the cross-sectional data could be considered a subset of longitudinal data. However, the major purpose of gathering longitudinal data is to investigate the changes within the samples over a period of time, from a few months to several decades (life-course studies). Researchers explore many different aspects of longitudinal data, including the distribution of values, temporal trends, anomalies, and the relationship between multiple responses and covariates in relation to time.

Researchers who make use of longitudinal data are often most interested in the changes within individuals while considering the changes of other relevant variables, such as ecological changes, situational factors, changes of social interaction over time, and/or developmental life stages. For example, in 1997 the Panel Study of Income Dynamics that updated the sample combinations (representative data for the current population in the United States) was intended to collect detailed information about economic and social disparities in child development on a national scale. A follow-up wave of the same subjects was conducted in 2002. As presented by Robert Mare and Margot Jackson in their 2004 article, researchers used these data to investigate how residential mobility and neighborhood change contribute to the overall socioeconomic variation in children’s neighborhoods.

Criminologists use longitudinal data to survey trends in substance use (or other deviant behaviors) within a cohort or individual substance use (or other deviant behaviors) over time. Sociologists have used the National Longitudinal Youth Survey, a longitudinal study that presents self-reports of deviant and illegal behavior from the initial wave of 1976 to the 2003 wave (with data collection ongoing); it also presents other variables relevant to families, schools, and peer association, aiding sociologists in their investigation of the life cycles of families, immigration mobility, and their consequences. The political scientists Michael Delli Carpini and Scott Keeler (1991) used the U.S. censuses over a period of decades to determine whether contemporary U.S. citizens were better or more poorly informed about politics than were citizens of earlier generations. These studies are also referred to as pseudo-panel studies because they try to investigate the changes over time in a certain population, while subjects are chosen independently each time. In other words, such studies are not the same as the narrowly defined longitudinal studies. Nevertheless, some researchers may view them as such. The social psychologists Howard Kaplan and Cheng-Hsien Lin (2000) investigated whether the link between mental health and deviance is reciprocal or unidirectional by repeatedly surveying the same subjects and, more important, whether both variables covariate over time. This study especially took advantage of the causal implications indicated by longitudinal data to resolve confusion over causal direction suggested by theories.


Longitudinal datasets provide several advantages over cross-sectional data. First, such datasets can track the changes among subjects over a short or a long period of time. Social scientists use such data to explore the potential trends underlying social mobility and unexpected trends in a society. Second, longitudinal datasets indicate the nature of covariates among variables over time to clarify their relations. In other words, controlling the previous measures of these variables helps social scientists determine the strength of relations among them. Third, the causal relations between variables can be verified through a clear time sequence of these variables. Researchers can examine many factors, such as deviant peer association and deviant behavior, with regard to their theoretical reciprocal relations only by using longitudinal data analysis. Finally, longitudinal data can indicate developmental (life stages) and historical trends; cross-sectional data can only partially indicate or describe developmental trends because the influences of confounding effects of historical trends cannot be taken into account in the analysis.

Nevertheless, longitudinal data are collected at high cost. First, because they require at least two waves, the cost of their collection is much higher than that of cross-sectional data in terms of both labor required and economic demands. Second, if the initial wave of a dataset was biased for any reason, the follow-ups will amplify the bias because of the repeated survey of the sampled subjects. Third, repeat respondents may develop a pattern of response attitudes to make every follow-up interview easier, which may lead to invalid responses (an effect known as panel conditioning). The developing pattern of interactions between an interviewer and an interviewee may also result in invalid responses. For example, an interviewer may learn that subjects would like to avoid certain sensitive questions, so, to avoid unpleasant confrontations, the interviewer may assume that the subject would give the same or a similar reply as in the earlier sessions of the interview.

Finally, if researchers use a type of longitudinal data collection known as panel study design, panel attrition because of death and/or withdrawal of some participants may bring unexpected bias to the data analyses. For example, studies that collect data about delinquent behavior among juveniles may lose some of the more serious delinquents in the first wave if they in fact become criminals, are arrested and imprisoned, or die from their criminal activities. Such first-wave respondents are less likely to hold a long-term job and thus will be difficult to locate for future interviews. It is not surprising that these subjects are quite different from the rest of the samples. Conclusions to be drawn from analyses of the longitudinal data are thus at risk.


Longitudinal data have different forms. Trend studies use the same measurements to investigate samples of a society at intervals to see if the society has held certain attitudes or values over time. Such studies survey different sampled groups in every wave of data collection and then compare the changes. Research on political preferences in different decades often makes use of trend data. A potential problem in trend studies is the possibility of incomparable samples collected in different waves. For example, a common problem is that the sample interviewed in an early wave is less educated than the second wave because educational attainment tends to increase in subsequent generations. Since educational attainment is often closely related to other social and individual factors, the discrepancy in educational attainment between generations may make the data suggesting changing trends of certain social attitudes and values less credible.

Cohort data tighten the selection of samples by focusing on a cohort or a subpopulation in a society. Although it may sample different groups out of the whole population, every sampled group shares a common feature. They might have been born within a certain span of years or raised during a particular societal event, such as World War II or the civil rights movement. The analyses of cohort data provide information about how a generation’s attitudes may change over time. It avoids the problem of generational differences that are compounded by trend studies.

Panel data collect information from the same group of subjects over time and are intended to examine the changes within individuals specifically. Whereas in general trend data and cohort data indicate overall changes in the society or a studied subgroup, panel studies attempt to examine how each subject’s attitudes and behaviors changed individually over the studied years. The most serious problem in collecting panel data is sample attrition, but this method provides the most comprehensive data on change. Researchers must be aware of the risk when drawing conclusions based on their studies.

Longitudinal data have grown in popularity among social scientists since the 1970s, becoming the mainstream in the 1990s. New, convenient statistical programs dealing with time-series analysis, latent growth curve models, hierarchical linear models, event history analysis, and structural equation models have greatly aided researchers using panel studies in clarifying relations among psychological and behavioral variables. Most likely, the scientific benefits of longitudinal data will make them the dominant data form.

Next post:

Previous post: