The genetic structure of human pathogens

The contemporary genetic structure and the evolutionary history of human pathogens are important to control and prevention strategies, as well as providing interesting case studies in evolutionary biology. Recently, large-scale DNA sequence data have become available for a wide variety of pathogens, and have revealed a variety of structures as diverse as the pathogens themselves. We describe the patterns observed in several of the most notorious killers, and ask what makes them so different from each other.

The genetic structuring that a human pathogen exhibits not only varies from species to species but also from one place to another. A good example is the protozoan parasite Plasmodium falciparum, responsible for the most virulent form of human malaria. Plasmodium falciparum populations exhibit a wide range of genetic structures that highlight the importance of local epidemiology. Unlike many human pathogens (viruses and bacteria in particular), P. falciparum has an obligate sexual stage in its life cycle. Male and female gametes fuse in the mosquito vector to form a short-lived diploid stage. In general, recombination breaks down associations between particular alleles at different loci (linkage disequilibrium). Nevertheless, strong linkage disequilibrium caused by inbreeding is typical in areas of low prevalence where self-fertilization is commonplace, such as South America. In contrast, P. falciparum populations in Africa exhibit high heterozygosity and low linkage disequilibrium, more typical of breeding populations, while populations in Southeast Asia show intermediate levels (Anderson et al., 2000).


Genetic structuring can persist stably in a pathogen population in spite of transient epidemic waves, a pattern that is seen in Neisseria meningitidis. These waves can often be attributed to the transmission of specific genotypes (Zhu et al., 2001). The agent of meningococcal meningitis and septicemia, N. meningitidis, is a common, diverse bacterium (Jolley et al., 2000), which undergoes recombination at a rate sufficiently high to render the phylogenetic signal between loci incongruent (Holmes et al., 1999). Zhu et al. (2001) showed how the importation of gene fragments from other Neisseria species – which might actually be more frequent than within-species recombination (Linz et al., 2000) – can drive these waves. Since the 1960s, subgroup IIIN. meningitidis has caused three successive waves of meningococcal disease worldwide. Within each wave, interspecific recombination can lead to escape variants with novel antigenic properties. They enjoy transient superiority and rise in frequency, but do not subsequently persist because they do not successfully colonize new host populations. So, although immunological novelty is an important advantage, stabilizing selection may account for the persistence of lineages over time.

Not just recent epidemic waves but the origin and subsequent spread of the entire species may be recorded in the population structure of a recently evolved pathogen. One such historically important pathogen is Yersinia pestis, the highly virulent bacterium responsible for plague. The first recorded plague pandemic dates to 541-767 A.D., and is thought to have been imported from East or Central Africa and spread from Egypt to Mediterranean countries. Its extremely limited nucleotide sequence diversity makes genetic analysis difficult because of the lack of polymorphism. A study by Achtman et al. (1999) using restriction fragment length polymorphisms revealed that Y. pestis is a highly uniform clone of Y. pseudotuberculosis, a global pathogen of wild and domestic animals, which infrequently causes a self-limiting infection in humans. On the basis of a comparison of the genome sequences of Y. pestis and Y. pseudotuberculosis, Prentice et al. (2001) have postulated that the critical event in the emergence of Y. pestis was the acquisition of a plasmid conferring pathogenicity by horizontal gene transfer. Achtman et al. (1999) fitted a model of rapid population expansion to date the origin of the pathogen at 1500-20 000 years ago, consistent with the date of the first pandemic.

It is somewhat counterintuitive to think that a microbe as strictly clonal as Y. pestis might have originated from a recombination event in which it acquired a pathogenicity-conferring plasmid. The emergence of the apparently nonrecombining (McVean et al., 2002) hepatitis C virus (HCV) in Egypt has been attributed to human medical intervention. HCV, in contrast to Y. pestis, exhibits high levels of sequence diversity despite its clonal nature. A leading cause of liver cancer and cirrhosis, HCV is genetically structured into various subtypes, of which 4a is prevalent in Egypt. Pybus et al. (2003) performed statistical analysis of the genetic structure within subtype 4a. Their coalescent approach was based on an epidemiological model that dated the period of expansion of subtype 4a to between 1930 and 1955, coinciding with an extensive anti-schistosomiasis vaccination campaign in the country. Moreover, the high rate of spread estimated by the analysis is consistent with existing hypotheses that implied spread by unsterile equipment.

There have been attempts to extract information about historic changes in prevalence even from highly recombining pathogens, the epitome of which must be the human immunodeficiency virus (HIV), which, according to population genetic estimates, is perhaps the most highly recombining of all microorganisms (McVean et al., 2002). Lemey et al. (2003) fitted a similar model to that of Pybus et al. (2003) to HIV-2 gene sequences obtained from patients in Guinea-Bissau. They found that the transmission of HIV-2 increased around 1955-1970, overlapping with the 1963-1974 war of independence in the country. Their analysis hints at the role of social changes, such as wartime, in the establishment of emergent infections.

If the driving force in the historic spread and subsequent globalization of a pathogen was the migration of its host populations, then a pathogen’s contemporary genetic structure can contain information about its host past, as has been seen in Helicobacter pylori, a common bacterium that colonizes the human gut and, through a process of constant irritation, is apparently responsible for most of the stomach maladies found in man, including peptic ulcer and gastric cancer. Recombination in H. pylori is so high that different loci, and polymorphisms within loci, appear to be in linkage equilibrium (Suerbaum et al., 1998), the result of very frequent recombination during mixed colonization by unrelated strains (Falush et al., 2001). Nevertheless, Falush et al. (2003) found that contemporary H. pylori could be divided into seven populations with distinct geographical distributions. The high sequence diversity and residual linkage disequilibrium in modern populations allowed the identification of ancestral populations from Africa, Central Africa, and East Asia. By reconstructing the genetic profiles of these ancient populations, they were able to produce putative migration routes for the ancient populations, migration routes that parallel hypothesized ancient human migration routes. Thus, analysis of H. pylori from human populations appears to provide independent insight into the details of human migrations.

The exceptional population structure of H. pylori allows reconstruction of the host evolutionary history, highlighting the intimate association of the host and parasite over thousands of years. But why does diversity within H. pylori correlate so well with human migrations, when others do not?

In fact, a lack of correlation is generally much easier to explain than a good correlation. Both HIV and N. meningitidis spread epidemically, with a transmission time measured in weeks or years rather than decades. Although geographical and cultural barriers can slow down their spread, current levels of human movement have proved sufficient to allow their global dissemination within years or decades. Mycobacterium tuberculosis, the cause of TB and human suffering that has inspired many literary works, spreads more slowly, and does show clear patterns of geographic structuring (Hirsh et al., 2004). However, because TB is apparently entirely clonal (Supply et al., 2003), any selective events associated with the adaption of TB to the human host, or host-pathogen coevolution, will affect the population structure across the entire genome, an example of which may be the global emergence of the hypervirulent Bejing genotype (Lillebaek et al., 2003).

The population structure of H. pylori is exceptional because its biology includes several key properties. These include slow transmission (which typically seems to occur in families), and, in the absence of antibiotics, typically life-long chronic infection. Further, it mutates at very high rates, which results in informative signals at each gene, while frequent recombination uncouples the signal observed at most genes from selective events that occur at antigenically important loci. Are there other pathogens with a similar suite of properties? Despite a few candidates (e.g., JC virus, Agostini et al., 1997), none really seems to fit the bill, but only time will tell which of our unwanted but intimate companions has most to reveal about our own history.

Next post:

Previous post: