Studies of human genetic history using the Y chromosome

The human Y chromosome’s raison d’etre is sex (in particular, male sex determination), and this louche connection has made it perhaps the strangest segment of our genome. It is specific to males and constitutively haploid in a diploid organism, and therefore escapes from recombination for most of its length, apart from the two pseudoautosomal regions in which crossing-over with the X chromosome occurs. This absence of intergenerational reshuffling means that it passes from father to son as an immense haplotype, changing only by mutation, and so contains within it a relatively straightforward record of its past. Over the last few years, many useful DNA polymorphisms (see Article 68, Normal DNA sequence variations in humans, Volume 4) have been discovered in the 23 Mb of nonrecombining euchromatin, and this has led to an explosion in studies of human genetic history using the Y chromosome.
A set of over 200 well-characterized binary polymorphisms representing unique or rare events in human evolution, mostly SNPs (single nucleotide polymorphisms), has been used to define Y-chromosomal haplotypes, known as haplogroups. These can be arranged into a unique phylogeny (Figure 1a) rooted by comparison to chimpanzee sequences (Underhill et al., 2001; Y Chromosome Consortium, 2002).
The Y chromosome also carries faster mutating markers, in particular microsatel-lites. These can be used to examine diversity within haplogroups, allowing time to most recent common ancestor (TMRCA) to be estimated, given an estimate of mutation rate (typically ~0.2% per microsatellite per generation). Intrahaplogroup diversity can be compared between populations, allowing deductions about the geographical origins of population expansions. Typically, most studies analyze a heterogeneous set of 6-20 microsatellites, but recently (Kayser et al., 2004), the entire chromosome has been searched for new markers, resulting in the isolation of 172 novel microsatellites to add to the known set of ~50. This resource should allow new accuracy in the estimation of TMRCAs, and the choice of sets of markers with relatively uniform and predictable mutational properties.
How are Y haplogroups distributed in different populations? The deepest-rooting clades within the Y phylogeny (haplogroups A and B) are almost entirely confined to sub-Saharan Africa (Figure 1b), and one estimate for TMRCA based on SNPs and a model of exponential population growth (Thomson et al., 2000) is 59 thousand years ago (KYA), with 95% confidence interval limits of 40-140 KYA. These observations are compatible with a recent out-of-Africa origin for modern humans.
Frequencies of the major Y haplogroups in five continental populations. (a) The Y phylogeny, showing the 18 major haplogroups, A to R. (b) Relative haplogroup frequencies (represented by area of filled circles) in indigenous populations of sub-Saharan Africa (Af), Europe (Eu), East Asia (EA), Oceania (Oc), and the Americas (Am).
Figure 1 Frequencies of the major Y haplogroups in five continental populations. (a) The Y phylogeny, showing the 18 major haplogroups, A to R. (b) Relative haplogroup frequencies (represented by area of filled circles) in indigenous populations of sub-Saharan Africa (Af), Europe (Eu), East Asia (EA), Oceania (Oc), and the Americas (Am).
The distribution of Y haplogroups shows a generally high degree of geographical differentiation, with, in one study (Seielstad et al., 1998), 65% of genetic variance existing between populations, and 35% within. This contrasts starkly with an autosomal figure of 15% between- and 85% within-population variance (Barbujani et al., 1997). The difference has been ascribed (Seielstad et al., 1998) to the predominant practice of patrilocality, in which women tend to move to the husband’s birthplace upon marriage, although this effect may only be appreciable at the local rather than global level (Wilder et al., 2004). Another contributory factor is genetic drift: stochastic variation in haplotype frequency from one generation to the next due to variance in offspring number, which is particularly marked on the Y because its effective population size is one-quarter of that of any autosome.
High geographical differentiation makes the Y a powerful system for the study of past population movements, including colonization and admixture. In the case of the Lemba, a Bantu-speaking South African tribe, oral history telling of descent from Jews who came from the north by boat finds support in the fact that they carry a high frequency of a Y-chromosomal haplotype that is typical of Jewish Middle Eastern populations (Thomas et al., 2000). Combining Y studies with analysis of maternally inherited mitochondrial DNA (mtDNA; see Article 5, Studies of human genetic history using mtDNA variation, Volume 1) has revealed evidence for sex-biased admixture, in which men, but not women, from one population have contributed genes to another population. Examples are seen in Greenland, where all mtDNAs analyzed are of Native American origin, while up to 64% of Y chromosomes are European (Bosch et al., 2003); and in Brazil, where European Y chromosomes are close to 100% (Carvalho-Silva et al., 2001) but mtDNAs originate approximately equally from Europeans, Africans (contributed by imported slaves), and indigenous Native Americans.
High-resolution Y haplotype analysis has revealed the extraordinary extent to which social structures can allow a single individual to contribute disproportionately to future populations (Zerjal et al., 2003). About 8% of the Y chromosomes from a large region of CentralAsia (~0.5% of the global total) belong to a very closely related lineage cluster with an estimated TMRCA of only ~1000 years, spread across 16 different populations from the Pacific to the Caspian Sea. The age of the cluster, its distribution, and its probable origin in Mongolia are all consistent with Genghis Khan and his dynasty being responsible for its unprecedented spread.
The discussion above assumes that natural selection is not influencing Y hap-lotype distributions. Under positive selection, a beneficial mutation could have led to the spread or fixation of a particular Y haplotype in the past. There is no convincing evidence that such a “selective sweep” has occurred, and apparent evidence from diversity data could always be explained as the effect of population expansion (see Article 7, Genetic signatures of natural selection, Volume 1). Deleterious mutations can occur in essential genes, including those involved in sperm production, but purifying selection acts continually to weed these out; also, the multicopy nature of many spermatogenesis genes may allow correction of mutations through intrachromosomal recombination (gene conversion) with unmutated copies (Rozen et al., 2003). Some studies of selection have focused on particular deleterious phenotypes known (or suspected) to be associated with the Y chromosome. Significant effects of Y haplotype on the probability of, for example, having low sperm count (Krausz et al., 2001) or prostate cancer (Paracchini et al., 2003) have been found, but are likely to have only minor effects on haplotype frequencies in populations. With current knowledge, it therefore seems reasonable to treat haplotype frequencies as an outcome of past population processes. However, advocates of the Y chromosome must not forget that, whatever its allure, it represents but a single “run” of the genealogical process, and that combination with other loci in the genome is necessary to provide a balanced view of the human past from DNA evidence.

Next post:

Previous post: