Latent variables are commonly used in the social sciences. Whether it is psychological measures such as depression, or sociological concepts such as socioeconomic status, many variables cannot be directly measured. Factor analysis, latent class analysis, structural-equation models, error-in-variable models, and item-response theory illustrate models that incorporate latent variables.
The basic statistical concept of latent variables analysis is simple. These variables refer to an abstract level of analysis that cannot be directly observed and measured. In order to estimate the numerical values of the parameters from empirical data, we must use observable indicators to link the unobservable conceptual variables. An example of a formative model, the measurement model for socioeconomic status (SES), may make clearer this distinction between conceptually abstract and observable levels of analysis. A researcher may observe the variables income, educational level, and neighborhood as indicators (manifest variables) of SES (latent variable). Latent variable models provide a means to parse out measurement errors by combining across observed variables (using correlations among variables), and they allow for the estimation of complex causal models. Those measurement errors may include faulty respondent memory or systematic errors made in the survey process.
Latent variable analysis is parallel to factor analysis. In modern test-theory models, the relation between the latent variable and the observed score (item responses) is mathematically explicit. The form for the relation is a generalized regression function of the observed scores on the latent variable. This regression function may differ in form—a linear pattern for the factor models and a logistic one for the probabilistic models (Mellenbergh 1994). Researchers should decide whether to treat the underlying latent vari-able(s) as continuous or discrete. Further discussion can be found in Tom Heinen’s demonstrations (1996).
In psychological studies, researchers may adopt a reflective model rather than a formative model because it is the standard conceptualization of measurement in psychology. This model specifies a pattern of covariation between the indicators, which can be fully explained by a regression on the latent variable. That is, the indicators are independent after conditioning on the latent variable (this is the assumption of local independence). An example of a reflective model in the latent variable of depression may use item responses on items like, "I am sad all the time," "I often feel helpless," and "I often feel my life is empty." In the reflective model of depression, it implies that a depressed person will be more inclined to answer the question affirmatively than a mentally healthy person. In ordinary language interpretation, depression comes first and "leads to" the item responses. In the mathematical term, it implies a regression of the indicators on the latent variable, while in the SES model (a formative model), the relationship between indicators and the latent variable is reversed. In other words, variation in the SES indicators now precedes variation in the latent variable; SES changes as a result of an increase in income and/or education and not the other way around.
In sum, latent variable theory signifies both realism and constructivism. Latent variables of the formative model are more a summary of the observed variables, while a reflective model implies entity realism about the latent variable. A causal implication between observable indicators and the latent variable thus is not a strong assumption. It is suggested that researchers be cautious when interpreting the relation in empirical studies (Borsboom et al. 2003).