FLYNN EFFECT (Social Science)

The term Flynn effect refers to the worldwide phenomenon of markedly increasing mean performance on standardized IQ tests over time. Most current IQ tests are designed to have a population mean of 100 and a standard deviation of 15 at the time they are developed. The mean and standard deviation are set by administering the test to a large group of individuals designed to be representative of the population as a whole (a process referred to as "standardization" or "norming"). However, a growing body of evidence suggests that the mean population performance on IQ tests has improved markedly over the decades since they were first introduced in 1905. Over a period of several years after an IQ test is introduced, the test’s mean of 100 becomes obsolete and IQ scores become elevated overall. Periodic renorming of IQ tests (typically at twelve- to twenty-five-year intervals) have helped mask the magnitude of this IQ increase. To compensate for improvements in performance over time and to ensure that the mean score is 100, individuals in the standardization group for a newer IQ test typically have to answer more (or harder) questions to obtain the same score on the new test as on an older test.

The degree and scope of the phenomenon of improved IQ test performance was not broadly known until James Flynn (b. 1934), a political scientist at the University of Otago in New Zealand, wrote two seminal articles on the topic that appeared in Psychological Bulletin in 1984 and 1987. Flynn reviewed dozens of studies in which groups were administered two or more IQ tests that were standardized at different times. Flynn noted that in these studies the groups’ mean performance on a test with newer standardization samples was nearly always lower than their performance on a test with older standardization samples.


Flynn has estimated the size of the Flynn effect on the Wechsler and Stanford-Binet series of IQ tests (the most widely used IQ tests in the United States) as being at about 3 points per decade or about .3 points per year (Flynn 1984, 2006). This rate of improvement has been remarkably consistent across different time periods and tests within these series. However, the rate of improvement is not uniform across all varieties of IQ tests, or even on subtests within a particular IQ test. The largest gains (.5 points per year or more) have been found on Ravens Progressive Matrices, a nonverbal pattern recognition IQ test. Lowest gains (near 0 points per year) have been on Wechsler Verbal IQ subtests such as vocabulary, information, and arithmetic.

Because of the time and cost involved in administering standardized IQ tests, they are typically administered only to students who are being considered for special education or gifted and talented programs. The Flynn effect has been found to particularly affect the educational classifications of students who are being tested for eligibility for these services shortly before and after a revised IQ test comes out.

The impact of the Flynn effect on children being tested for mental retardation services in the early 1990s was quite substantial. The IQ criterion for mental retardation is typically an IQ of 70 or below, which is two standard deviations below the mean of 100 on current IQ tests (allowances for measurement error typically permit a score of up to 75). When the Wechsler Intelligence Scale for Children-Third Edition (WISC-III) supplanted the Wechsler Intelligence Scale for Children-Revised (WISC-R) in 1991, the test norms for the WISC-R were nineteen years old. Tomoe Kanaya, Matthew Scullin, and Stephen Ceci (2003) found that children in the mild mental retardation and borderline IQ ranges scored more than five points lower on the WISC-III than on the WISC-R, which is similar to what was found with children in the average range of intelligence. This five-point difference in scores more than doubled the number of children who were eligible for mental retardation services on the basis of their IQ scores. This is because about 2.27 percent of children would be expected to obtain an IQ score of 70 or below on the WISC-III at the time of its standardization, whereas the obsolete WISC-R was only capturing the bottom 0.87 percent of children by 1991. Scullin (2006) found that in forty-three states and in the United States as a whole, a long and steady decline in the percentage of schoolchildren receiving mental retardation services during the 1980s and early 1990s ended and indeed reversed around the time of the introduction of the WISC-III.

As the norms of an IQ test grow older, the Flynn effect increases the number of children eligible for learning disability services in areas in which the criterion for eligibility is a significant discrepancy between children’s IQ scores and their performance on an achievement test. Higher IQ test scores relative to achievement result in an increased likelihood of finding a significant discrepancy. Similarly, more children become eligible for gifted and talented programs over time as it becomes easier to meet the criterion. Once a newly standardized test is introduced, these trends reverse themselves and it becomes harder to meet the IQ-achievement discrepancy criterion for learning disability and the IQ threshold for gifted and talented services.

As documented by Ulric Neisser (in The Rising Curve, 1998), the Flynn effect raises some important nature vs. nurture questions about the relative strength of genes and environment in determining intelligence, and about IQ as an estimate of intelligence. The Rising Curve was written in response to Richard Herrnstein and Charles Murray’s best-selling book The Bell Curve (1994), in which the authors argued that intelligence is assessed well by IQ tests and that low IQ is related to a wide range of negative life outcomes, from criminality to risk for divorce. Herrnstein and Murray used behavioral genetics data to make a case for a strong genetic influence on IQ, including ethnic differences in mean IQ scores. Neisser counter-argued that IQ scores have been increasing at too rapid a rate to be explained by genetics alone, which suggests that there are strong environmental influences affecting IQ scores. Neisser’s book noted that possible non-genetic explanations for the Flynn effect include increasing environmental complexity, new schooling techniques, and improvements in nutrition.

Just as there is no consensus about the origins of the Flynn effect, there is conflicting evidence about whether the Flynn effect is continuing unabated. In the United States, comparison study data for the Wechsler Adult Intelligence Scale-Third Edition (standardized in 1995) suggested that perhaps the rate of IQ increases in the United States was diminishing. Longitudinal data from IQ-test-like draft-board examinations of all draft-eligible males in some European countries also documented a lev-eling-off of IQ score gains during the 1990s. However, data from comparison studies for the Stanford-Binet Intelligence Scale-Fifth Edition and Wechsler Intelligence Scale for Children-Fourth Edition (both normed around 2001) suggest that the best estimate of the rate of IQ test improvement in the United States is currently still around .3 points per year (Flynn 2006).

Next post:

Previous post: