Karl Pearson was one of the principal architects of the modern theory of mathematical statistics. His interests ranged from mathematical physics, astronomy, philosophy, history, literature, socialism, and the law to Darwinism, evolutionary biology, heredity, Mendelism, eugenics, medicine, anthropology, and crainometry. His major contribution, however, by his lights and by posterity’s, was to establish and advance the discipline of mathematical statistics.
The second son of William Pearson and Fanny Smith, Carl Pearson was born in London on March 27, 1857. In 1879 the University of Heidelberg changed the spelling of his name when it enrolled him as "Karl Pearson"; five years later he adopted this variant of his name and eventually became known as "KP." His mother came from a family of seamen and mariners, and his father was a barrister and Queen’s Counsel. The Pearsons were a family of dissenters and of Quaker stock. By the time Carl was twenty-two he had rejected Christianity and adopted "Freethought" as a nonreligious faith that was grounded in science.
Pearson graduated with honors in mathematics from King’s College, Cambridge University in January 1879. He stayed in Cambridge to work in Professor James Stuart’s engineering workshop and to study philosophy in preparation for his trip to Germany in April. His time in Germany was a period of self-discovery, philosophically and professionally. Around this time, he began to write The New Werther, an epistolary novel on idealism and materialism, published in 1880 under the pseudonym of Loki (a mischievous Scandinavian god). In Heidelberg Pearson abandoned Karl philosophy because "it made him miserable and would have led him to short-cut his career" (Karl Pearson, Letter to Robert Parker, 17 August 1879. Archive reference number: NW/Cor.23. Helga Hacker Pearson papers within Karl Pearson’s archival material held at University College London). Though he considered becoming a mathematical physicist, he discarded this idea because he "was not a born genius" (Karl Pearson, Letter to Robert Parker, 19 October 1879. Archive reference number/922. Karl Pearson’s archival material held at University College London). He stayed in Berlin and attended lectures on Roman international law and philosophy.
He returned to London and studied law at Lincoln’s Inn at the Royal Courts of Justice. He was called to the bar at the end of 1881 but practiced for only a very short time. Instead, he began to lecture on socialism, Karl Marx, Ferdinand Lassalle, and Martin Luther from 1880 to 1881, while also writing on medieval German folklore and literature and contributing hymns to the Socialist Song Book. In the course of his lifetime, he produced more than 650 publications, of which 400 were statistical; over a period of twenty-eight years he founded and edited six academic journals, of which Biometrika is the best known.
Having received the Chair of Mechanism and Applied Mathematics at University College London (UCL) in June 1884, Pearson taught mathematical physics, hydrodynamics, magnetism, electricity, and elasticity to engineering students. Soon after, he was asked to edit and complete William Kingdom Clifford’s The Common Sense of Exact Science (1885) and Isaac Todhunter’s History of the Theory of Elasticity (1886).
THE GRESHAM LECTURES ON STATISTICS
Pearson was a founding member of the Men’s and Women’s Club, established in 1885 for the free and unreserved discussion of all matters concerning relationships of men and women. Among the various members was Marie Sharpe, whom he married in June 1890. They had three children, Sigrid, Helga and Egon. Six months after his marriage, he took up another teaching post in the Gresham Chair of Geometry at Gresham College in the City of London (the financial district), which he held for three years concurrently with his post at UCL. From February 1891 to November 1893, Pearson delivered thirty-eight lectures.
These lectures were aimed at a nonacademic audience. Pearson wanted to introduce them to a way of thinking that would influence how they made sense of the physical world. While his first eight lectures formed the basis of his book The Grammar of Science, the remaining thirty dealt with statistics because he thought this audience would understand insurance, commerce, and trade statistics and could relate to games of chance involving Monte Carlo roulette, lotteries, dice, and coins. In 1891 he introduced the histogram (a type of bar chart), and he devised the standard deviation and the variance (to measure statistical variation) in 1893. Pearson’s early Gresham lectures on statistics were influenced by the work of Francis Ysidro Edgeworth, William Stanley Jevons, and John Venn.
Pearson’s last twelve Gresham lectures signified a turning point in his career owing to the Darwinian zoologist W. F. R. Weldon (1860-1906), who was interested in using a statistical approach for problems of Darwinian evolution. Their emphasis on Darwinian population of species, underpinned by biological variation, not only implied the necessity of systematically measuring variation but also prompted the reconceptualization of a new statistical methodology, which led eventually to the creation of the Biometric School at University College London in 1894. Earlier vital and social statisticians were mainly interested in calculating averages and were not concerned with measuring statistical variation.
Pearson adapted the mathematics of mechanics, using the method of moments to construct a new statistical system to interpret Weldon’s asymmetrical distributions, since no such system existed at the time. Using the method of moments, Pearson established four parameters for curve fitting to show how data clustered (the mean), and spread (the standard deviation), if there were a loss of symmetry (skewness), and if the shape of the distribution was peaked or flat (kurtosis). These four parameters describe the essential characteristics of any empirical distribution and made it possible to analyze data that resulted in various-shaped distributions.
By the time Pearson finished his statistical lectures in May 1894, he had provided the infrastructure of his statistical methodology. He began to teach statistics at University College in October. By 1895 he had worked out the mathematical properties of the product-moment correlation coefficient (which measures the relationship between two continuous variables) and simple regression (used for the linear prediction between two continuous variables). In 1896 he introduced a higher level of mathematics into statistical theory, the coefficient of variation, the standard error of estimate, multiple regression, and multiple correlation, and in 1899 he established scales of measurement for continuous and discrete variables. Pearson devised more than eighteen methods of correlation from 1896 to 1911, including the tetrachoric, polychoric, biser-ial, and triserial correlations and the phi coefficient. Inspired and supported by Weldon, Pearson’s major contributions to statistics were: (1) introducing standardized statistical data-management procedures to handle very large sets of data; (2) challenging the tyrannical acceptance of the normal curve as the only distribution on which to base the interpretation of statistical data; (3) providing a set of mathematical statistical tools for the analysis of statistical variation, and (4) professionalizing the discipline of mathematical statistics. Pearson was elected a Fellow of the Royal Society in 1896 and awarded its Darwin Medal in 1898.
Pearson’s ongoing work throughout the 1890s with curve fitting signified that he needed a criterion to determine how good the fit was. He continued to work on improving his methods until he devised his chi-square goodness of fit test in 1900 and introduced the concept of degrees of freedom. Although many other nineteenth-century scientists attempted to find a goodness of fit test, they did not give any underlying theoretical basis for their formulas, which Pearson managed to do. The overriding significance of this test meant that statisticians could use statistical methods that did not depend on the normal distribution to interpret their findings. Indeed, the chi-square goodness of fit test represented Pearson’s single most important contribution to the modern theory of statistics, for it raised substantially the practice of mathematical statistics. In 1904 Pearson established the chi-square statistic for discrete variables to be used in contingency tables. Pearson published his statistical innovations from his Gresham and UCL lectures in a set of twenty-three papers, "Mathematical Contributions to the Theory of Evolution," principally in Royal Society publications from 1893 to 1916. He established the first degree course in statistics in Britain in 1915.
PEARSON’S FOUR LABORATORIES
In the twentieth century Pearson founded and managed four laboratories. He set up the Drapers’ Biometric Laboratory in 1903 with a grant from the Worshipful Drapers’ Company (who funded the laboratory until 1933). The methodology incorporated in this laboratory involved the use of his statistical methods and numerous instruments. The problems investigated by the biometri-cians included natural selection, Mendelian genetics and Galton’s law of ancestral inheritance, crainometry, physical anthropology, and theoretical aspects of mathematical statistics. A year after Pearson established the Biometric Laboratory, the Worshipful Drapers’ Company gave him a grant to launch an Astronomical Laboratory equipped with a transit circle and a four-inch equatorial refractor.
In 1907 Francis Galton (who was then eighty-five years old) wanted to step down as director of the Eugenics Record Office, which he had set up three years earlier; he asked Pearson to take over the office, which Pearson subsequently renamed the Galton Eugenics Laboratory. Pearson had, by then, spent the previous fourteen years developing the foundations of his statistical methodology. His work schedule was so demanding that he took on this role only as a personal favor to Galton. Because Pearson regarded his statistical methods as unsuitable for problems of eugenics, he further developed Galton’s actuarial death rates and family pedigrees for the methodology of the Eugenics Laboratory. The latter procedure led to his twenty-one-volume Treasury of Family Inheritance (1909-1930). In 1924 Pearson set up the Anthropometric Laboratory, made possible by a gift from his student, Ethel Elderton. When Galton died in January 1911, his estate was bequeathed to UCL and he named Pearson as the first professor of eugenics. The Drapers’ Biometric and the Galton Eugenics laboratories, which continued to function separately, became incorporated into the Department of Applied Statistics.
Although Pearson was a eugenicist, he eschewed eugenic policies. For him and his British contemporaries (e.g., Herbert Spencer, George Bernard Shaw, H. G. Wells, Marie Stopes, and Virginia Woolf), eugenics was principally a discourse about class, whereas in Germany and America the focus was on racial purity. The British were anxious that the country would be overrun by the poor unless their reproduction lessened; the middle classes were thus encouraged to have more children. In any case eugenics did not lead Pearson to develop any new statistical methods, nor did it play any role in the creation of his statistical methodology.
His wife, Marie Sharpe, died in 1928, and in 1929 he married Margaret Victoria Child, a co-worker in the Biometric Laboratory. Pearson was made Emeritus Professor in 1933 and given a room in the Zoology Department at UCL, which he used as the office for Biometrika. From his retirement until his death in 1936, he published thirty-four articles and notes and continued to edit Biometrika.
SCHOLARSHIP ON PEARSON
Pearson’s statistical work and innovations, his philosophy and his ideas about Darwinism, evolutionary biology, Mendelism, eugenics, medicine, and elasticity have been of considerable interest to innumerable scientists and scholars for more than a century. Throughout the twentieth century, many commentators viewed Pearson as a disciple of Francis Galton who merely expanded Galton’s ideas on correlation and regression. Consequently, a number of scholars have falsely assumed that Pearson’s motivation for creating a new statistical methodology arose from problems of eugenics. Among writers who have taken this view are Daniel Kevles, Bernard Norton, Donald Mackenzie, Theodore Porter, Richard Soloway, and Tukufu Zuberi. However, using substantial corroborative historical evidence in Pearson’s archives, Eileen Magnello (1999) provided compelling documentation that Pearson not only managed the Drapers’ Biometric and the Galton Eugenics laboratories separately but also that they occupied separate physical spaces, that he maintained separate financial accounts, that he established very different journals, and that he created two completely different methodologies. Moreover, he took on his work in the Eugenics Laboratory very reluctantly and wanted to relinquish the post after one year. Pearson emphasized to Galton that the sort of sociological problems that he was interested in pursuing for his eugenics program were markedly different from the research that was conducted in the Drapers’ Biometric Laboratory.
Juxtaposing Pearson alongside Galton and eugenics has distorted the complexity and totality of Pearson’s intellectual enterprises, since there was virtually no relationship between his research in "pure" statistics and his agenda for the eugenics movement. This long-established but misguided impression can be attributed to (1) an excessive reliance on secondary sources containing false assumptions, (2) the neglect of Pearson’s voluminous archival material, (3) the use of a minute portion of his 600-plus published papers, (4) a conflation of some of Pearson’s biometric and crainometric work with that of eugenics, and (5) a blatant misinterpretation and misrepresentation of Pearsonian statistics.
Continuing to link Galton with Pearson, Michael Bulmer (2003) suggested that the impetus to Pearson’s statistics came from his reading of Galton’s Natural Inheritance. However, Magnello (2004) argued that this view failed to take into account that Pearson’s initial reaction to Galton’s book in March 1889 was actually quite cautious. It was not until 1934, almost half a century later, when Pearson was 78 years old, that he reinterpreted the impact Galton’s book had on his statistical work in a more favorable light—long after Pearson had established the foundations to modern statistics.
The central role that Weldon played in the development of Pearson’s statistical program has been almost completely overlooked by most scholars, except for Robert Olby (1988) and Peter Bowler (2003), who gave Weldon greater priority than Galton in Pearson’s development of mathematical statistics as it related to problems of evolutionary biology. Weldon’s role in Pearson’s early published statistical papers was acknowledged by Churchill Eisenhart (1974), Stephen Stigler (1986), and A. W F. Edwards (1993). In all her papers, Magnello addressed Weldon’s pivotal role in enabling Pearson to construct a new mathematically based statistical methodology.
Norton (1978a, 1978b) and Porter (2004) argue that Pearson’s iconoclastic and positivistic Grammar of Science played a role in the development of Pearson’s statistical work. However, Magnello (1999, 2005a) disputed this and argued that while The Grammar of Science represents his philosophy of science as a young adult, it does not reveal everything about his thinking and ideas, especially those in connection with his development of mathematical statistics. Thus, she maintains, it is not helpful to see this topic as an account of what Pearson was to do throughout the remaining forty-two years of his working life.
Although long-standing claims have been made by various commentators throughout the twentieth and early twenty-first centuries that Pearson rejected Mendelism, Magnello (1998) showed that Pearson did not reject Mendelism completely but that he accepted the fundamental idea for discontinuous variation. Moreover, Philip Sloan (2000) argued that the biometricians’ debates clarified issues in Mendelism that otherwise might not have been developed with the rigor that they were to achieve.
Additionally, virtually all historians of science have failed to acknowledge that Pearson’s and Galton’s ideas, methods, and outlook on statistics were profoundly different. However, Bowler (2003) detected differences in their statistical thinking because of their different interpretations of evolution, and Stigler acknowledged their diverse approaches to statistics in his The History of Statistics (1986). Magnello (1996, 1998, 1999, 2002) explained that whereas Pearson’s main focus was goodness of fit testing, Galton’s emphasis was correlation; Pearson’s higher level of mathematics for doing statistics was more mathematically complex than Galton’s; Pearson was interested in very large data sets (more than 1,000), whereas Galton was more concerned with smaller data sets of around 100 (owing to the explanatory power of percentages); and Pearson undertook long-term projects over several years, while Galton wanted faster results. Moreover, Galton thought all data had to conform to the normal distribution, whereas Pearson emphasised that empirical distributions could take on any number of shapes.
Given the pluralistic nature of Pearson’s scientific work and the complexity of his many statistical innovations twinned with his multifaceted persona, Pearson will no doubt continue to be of interest for many future scholars. Pearson’s legacy of establishing the foundations of contemporary mathematical statistics helped to create the modern world view, for his statistical methodology not only transformed our vision of nature but also gave scientists a set of quantitative tools to conduct research, accompanied with a universal scientific language that standardized scientific writing in the twentieth century. His work went on to provide the foundations for such statisticians as R. A. Fisher, who went on to make further advancements in the modern theory of mathematical statistics.