The construction and use of radiation hybrid maps in genomic research

1. Introduction

At the beginning of the sixties, Barski et al. (1960) reported the occurrence of spontaneous cellular fusion events. A few years later, new methodologies using Sendai virus inactivated by UV (Yerganian and Nell, 1966) or polyethylene glycol (PEG) treatments made it possible to induce cell fusion and to produce heterocaryotic somatic hybrid cells (Pontecorvo, 1975). In addition, hybrid cells, originally called recombinant cells, could be separated from the parental cells from which they originated by culturing the cells in selective media. In 1975, Goss and Harris proposed for the first time the use of these techniques for genetic mapping purposes: they had observed that lethally X-ray irradiated donor cells could fuse to a receptor cell giving rise to a viable hybrid cell line. This so-called radiation hybrid (RH) cell line possessed a heterocaryon with chromosomes corresponding to a mosaic of chromosome fragments from the donor and the receptor cells: the stronger the irradiation dose, the shorter the average expected size of the donor chromosome fragments was. The principle is thus very similar to the classical linkage mapping strategy since X-ray induced breakage points mimic recombination points produced during meiosis.

In this review, we will consider the main principles of the construction and characterization of RH panels, their advantages to other mapping tools for the development of high-resolution genetic and comparative maps, and their possible contributions in different genome mapping projects.


2. Construction of RH panels: principles

2.1. Production and selection of RH cells

As shown in Figure 1, nucleotide biosynthesis can be accomplished through two biological pathways, that is, the main and the salvage pathways. In selective HAT (Hypoxanthin Aminopterin Thymidine) medium, aminopterin blocks the main pathway while the two precursors (hypoxanthin and thymidine) are available for the salvage pathway. To construct an RH cell line, the chosen receptor cell line (usually hamster or mouse) is deficient either for thymidine kinase (TK) or hypoxanthine-guanine phosphoribosyl transferase (HGPRT). After X-ray irradiation of the donor cells from the species of interest, usually with a dose between 3000 and 12 000 Y -ray Rad, a fusion with the receptor cells is induced. When grown in selective HAT medium, nonfused (tk- or hgprt-) receptor cells lacking one of the two key enzymes for the salvage pathway and lethally irradiated nonfused donor cells will be counterselected. Only RH cells in which the deficient tk or hgprt gene from the receptor cell has been complemented by its functional counterpart carried by one of the integrated chromosomal donor fragments will then survive (Figure 2).

Nucleotide biosynthesis pathway. In the selective HAT medium, the main pathway is blocked by aminopterin (A of HAT), a structural analog of folic acid. Cells need the two precursors hypoxanthine (H of HAT) and thymidine (T of HAT) to produce respectively ribo- and deoxyribonucleotides through the salvage pathway. Mutant cells deficient, either for HGPRT (hypoxanthine-guanine phosphoribosyl transferase) or TK (thymidine kinase), cannot use the salvage pathway and thus will die

Figure 1 Nucleotide biosynthesis pathway. In the selective HAT medium, the main pathway is blocked by aminopterin (A of HAT), a structural analog of folic acid. Cells need the two precursors hypoxanthine (H of HAT) and thymidine (T of HAT) to produce respectively ribo- and deoxyribonucleotides through the salvage pathway. Mutant cells deficient, either for HGPRT (hypoxanthine-guanine phosphoribosyl transferase) or TK (thymidine kinase), cannot use the salvage pathway and thus will die

During the culture of RH cell lines, chromosomal segments originating from the donor cells are randomly eliminated, while chromosomes from the receptor cell are conserved. To our knowledge, the mechanism behind this phenomenon has not yet been elucidated. Consequently, each independent RH cell line constituting an RH panel (usually composed of about one hundred lines) will contain a different set of chromosomal segments from the donor genome. Nevertheless, a bias will remain for the genomic region containing the marker gene of selection (e.g., in man, tk and hgprt are respectively located on HSA17 and HSAX chromosomes and, therefore, the corresponding regions will be preferably retained).

From a practical point of view, to allow for experimental replication and data comparisons using the same lines, it is important to extract sufficiently large quantities of DNA from each line since additional rounds of culture of the RH cell lines will lead to a set of donor chromosomal segments different from the original one. The main principles of the production and selection of RH cell lines are summarized in Figure 2.

Schematic representation of the construction of RH cell lines

Figure 2 Schematic representation of the construction of RH cell lines

2.2. “Haploid” and “diploid” RH panels

Although the principle of producing radiation hybrids was known since the middle of the 1970s, the use of RH panels to construct maps remained limited for about 15 years owing to the paucity of available genes and markers and the fact that PCR was not yet a mature technology. In 1990, Cox and coworkers were the first to demonstrate the feasibility of producing an RH panel and its efficiency to construct an RH map of human chromosome 21 (HSA21). They also presented the first principles of the statistical tools necessary for mapping a marker. For this large-scale RH panel, the donor cells were obtained from a somatic hybrid cell line carrying a “haploid” copy of HSA21. After irradiation with a dose of 8000-Rad X ray and fusion with a rodent receptor cell line, the authors produced 103 haploid RH cell lines, which were found to randomly retain between 30 and 60% of human chromosome 21.

At that point, the generalization of this procedure to produce whole-genome RH panels (WGRHP) was not straightforward since monohybrid somatic cell lines were difficult or nearly impossible to produce for most species. Thus, too many RH lines would have to be produced for the panel to be efficient. A new strategy, consisting of using donor cells derived from diploid cell lines (human fibroblasts), was then developed resulting in “diploid” RH cell lines (Walter et al., 1994). It is interesting to note that this methodology is very similar to the initial proposition made by Goss and Harris (1975). This new strategy was then used to generate RH panels for many different species such as mouse (Schmitt et al., 1996; McCarthy etal., 1997), cattle (Womack etal., 1997; Rexroad etal., 2000; Williams etal., 2002), dog (Priat etal., 1998), pig (Yerle etal., 1998; Yerle etal., 2002), rat (Watanabe etal., 1999; McCarthy etal., 2000), cat (Murphy etal., 1999), horse (Kiguwa etal., 2000; Chowdhary etal., 2002), macaque (Murphy etal., 2001), zebra fish (Geisler etal., 1999), and chicken (Morisson etal., 2002). Table 1 presents an overview of the different WGRHP panels available for the species cited above.

Table 1 Overview of some whole-genome radiation hybrid panel (WGRHP) available in different species

Species Reference Irradiation dose (Rad) Number of lines Mean estimated retention frequency kb/cRa Resolution kbb
Cat Murphy et al. (1999) Murphy et al. (2001) 5000 93 0.39 195 538
Cattle Womack et al. (1997) Band et al. (2000) 5000 101 0.22 330 1500
Cattle Cattle Chicken Rexroad et al. (2000) Williams et al. (2002) Morisson et al. (2002) 12 000 3000 6000 88 94 90 0.30 0.23 0.22 n.c.

75 43.7-63

n.c.

347 221-318

Pitel et al. (2004)
Dog Priat et al. (1998) 5000 126 0.21 166 627
Horse Kiguwa et al. (2000) 3000 94 0.28 n.c. n.c.
Horse Chowdhary et al. (2002) 5000 92 0.44 200 494
Chowdhary et al. (2003)
Macaque Murphy et al. (2001) 5000 93 0.33 330 1500
Man Gyapay et al. (1996) 3000 93 0.32 208 699
Man Stewart et al. (1997) 10 000 83 0.16 29 218
Mouse Schmitt et al. (1996) 3000 164 0.18 3500 11 856
Mouse McCarthy et al. (1997) 3000 94 0.28 98 372
Pig Yerle et al. (1998) 7000 126 0.35 37 84
Pig Yerle et al. (2002) 12 000 90 0.35 14 44
Rat Watanabe et al. (1999) 3000 96 0.27 106 409
Zebra fish Geisler et al. (1999) 3000 94 0.18 61 361

aIn some cases, we assume 1 cM is equivalent to 1 Mb to estimate the ratio kb/cR.

b The resolution was estimated as the average size of retained fragments (=100 times the ratio kb/cR following the definition of the unity cRay) divided by the product of the number of lines and the average retention frequency.

3. RH mapping methodology

3.1. Definition and control of the parameters of interest

RH mapping methodology is mostly inspired by the linkage mapping methodologies leading to a similarity in terminology in both cases. The two important parameters to be considered in RH mapping are the probability of breakage between two markers and the retention probability of the donor chromosomal segment (Figure 3).

RH mapping principles. Breakage points generated by X-ray irradiation in the donor cells mimic meiotic recombinations, and it becomes possible for a given chromosomal segment (here between markers A and B) to sort out nonrecombinant "parental" haplotypes (no breakage between A and B and the resulting segment is retained or eliminated) and "recombinant" haplotypes (breakage between A and B and only one marker is retained). The analogy with linkage mapping can be further prolonged with the identification of "double recombinant" RH cell lines (breakage between A and B and they are retained or eliminated together). Observable results from RH panel screening for the presence/absence of markers are shown on the bottom of the figure

Figure 3 RH mapping principles. Breakage points generated by X-ray irradiation in the donor cells mimic meiotic recombinations, and it becomes possible for a given chromosomal segment (here between markers A and B) to sort out nonrecombinant “parental” haplotypes (no breakage between A and B and the resulting segment is retained or eliminated) and “recombinant” haplotypes (breakage between A and B and only one marker is retained). The analogy with linkage mapping can be further prolonged with the identification of “double recombinant” RH cell lines (breakage between A and B and they are retained or eliminated together). Observable results from RH panel screening for the presence/absence of markers are shown on the bottom of the figure

3.1.1. Breakage probability

The breakage probability is dependent both on the physical distance separating markers and the irradiation dose used to generate the panel: for a given X-ray dose, the closer the two markers are, the lower the probability of breakage between them and the larger the probability of coretention or coelimination. The breakage probability varies from 0 if the markers are at the same position to 1 if markers are segregating independently since their co-retention is only dependent on the retention probability.

Distances in RH maps are measured in cRayxRad. 1 cRayxRad corresponds to a breakage probability of 1 % between two markers in a panel constructed with a y -ray dose of x Rad. In theory, it is possible to control the resolution of the panel by controlling the irradiation dose; however, it should be noted that the reproducibility using the same irradiation dose among laboratories appears to be far from perfect and the DNA fragmentation due to the irradiation dose seems to depend on donor cells. Therefore, the resolution of a 3000-Rad panel may be the same as that of a 5000-Rad panel or even higher (see Table 1). Moreover, the expected resolution has to be adjusted according to the number of markers available or expected.

3.1.2. Retention probability

The different X-ray induced “recombinant” or “nonrecombinant” chromosomal segments from the donor cell can be detected only if they are retained in the RH cell lines considered (see Figure 3). Thus, the probability of retention introduces a second level of control to achieve a good resolution for the RH panel. This parameter can be compared to the number of individuals needed in a linkage mapping experiment to obtain a sufficient number of informative meiosis. Nevertheless, since the mechanisms of elimination of chromosomal segments during RH cell growth remain poorly understood, the retention probability in the RH panel can be controlled during its construction by operating a selection among the RH cell lines available. This should be done carefully to avoid introducing a bias in further analyses (for instance, by screening a small set of independent markers to estimate the overall retention probability of each RH cell line produced).

3.2. Parameters estimation

Statistical methodologies for RH mapping are identical to the ones used classically in linkage mapping and developed in the beginning of the 1990s (Boehnke et al., 1991; Cox etal., 1990; Lunetta and Boehnke, 1994; Lange etal., 1995). A description of these methodologies follows.

Screening the RH panel for a set of N markers permits the identification, for each pair of markers A; and Aj (i and j varying from 1 to N), of four different types of RH cell lines (respectively A;+Aj+, A;+Aj-, A;-Aj+, and A;~Aj~ corresponding to the lines having retained respectively both A; and Aj, A; but not Aj, Aj but not Aj, and neither A; nor Aj; Figure 3). However, it is not possible to estimate the parameters of interest directly from these four different populations of each of the four line types since it is not possible to directly distinguish between breakage and marker retention probabilities. Nevertheless, the expected number for each class can be calculated using as unknown parameters the breakage probability (0 j between each couple of markers A; and Aj, the retention probabilities r; (respectively j of marker A; (respectively Aj) and the coretention probabilities r;j (i =j) of A; and Aj. These parameters are then estimated by maximizing the likelihood of the observations (see Cox etal., 1990 for detailed equations). Finally, assuming randomness of breakage events and no interference between them, breakage occurrence can be modeled as a Poisson process. The distance dy (in cRayx, x being the irradiation dose in Rad) between markers A; and Aj is thus dj = — log (1 – 0ij). This function is analogous to the Haldane mapping function used in linkage mapping and computations of likelihood functions require the assumption of absence of multiple recombinations between markers that are responsible for nonadditiveness of two-point distances for physically distant markers. However, in their pioneering experiment (14 markers covering 20Mb on HSA21), Cox etal. (1990) showed that this effect appeared to be nonsignificant or negligible.

3.3. Linkage group construction (two-point analysis)

The first step in the construction of a map is to identify markers belonging to the same chromosome or to the same genomic region. Thus, two-point analyses consist of identifying, at a given threshold, a group of markers that are genetically linked. As in linkage mapping, for each of the N (N – 1)/2 pairs of markers A; and Aj, linkage is evaluated by calculating the Lod score (Lodj) corresponding to the log ratio of the likelihoods L1 j of the data assuming linkage between A; and Aj (alternative hypothesis Hi) and L0j of the data under the null hypothesis (H0) of no corresponding linkage. Parameter values are those estimated as before for H1t while a breakage probability of 1 is assumed to calculate H0.

A linkage group at the S significance threshold will be defined by the L markers Ak (k varying from 1 to L with L < N) such that at least one marker Al (l varying from 1 to L) gives Lodkl > S. The number of linkage groups is thus increasing with S. It should be noted that for a given threshold (for instance S = 3), the linkage criterion is generally more stringent than in classical linkage mapping (Cox et al., 1990).

3.4. Determination of the order of markers inside a linkage group

As for linkage mapping, two families of methods are used to order markers inside linkage groups: nonparametric and parametric methods (Boehnke, 1992; Boehnke etal., 1991; Lange etal., 1995).

Nonparametric methods are based on an intuitive parsimony criterion consisting in finding the order among markers, which results in the minimum number of breakages in the RH cell lines of the panel. For instance, let us consider N = 10 markers screened on a cell line giving the result vector h = [1 1 1909001 1] (1 if the marker is present, 0 if the marker is absent, and 9 if the status of the marker is unknown). To explain h, at least two breakage points are necessary (one between marker 3 and 5 and one between 8 and 9, unknown breakpoints are ignored). The minimal number of breakages is called Obligate Chromosomal Break (OCB) and represents a minor of the actual number of breakages since double recombinants are ignored. To evaluate the OCB, one needs to count the number of times 0 (respectively 1) is followed by a 1 (respectively 0) in the result vector. The main advantage of this method is that the model considered is not restrictive (the only constraint is to ignore double recombinants), rather intuitive and not computationally intensive. However it provides no information about distances between markers.

Parametric approaches are generally based on the maximum likelihood principle. Thus, they define a stochastic model to sort the different possible orders according to the likelihood of the data. Several models have been proposed that differ by the number of parameters they impose for the estimation. In general, the hypothesis aims at restraining the number of retention probabilities among markers. In the fuller model (Cox et al., 1990) named “general retention model”, if N markers are considered, retention probabilities are estimated for all the (N(N + 1)/2) possible chromosomal segments (N possible segment with 1 marker; N/2 or (N – 1)/2 if N is uneven containing N – 1 markers; … ; 1 possible segment containing all N markers). This model thus includes (N2 + 3N – 2)/2 parameters (N -1 breakage probabilities and (N(N + 1)/2) retention probabilities). When N gets bigger, it becomes quite computationally intensive, overparameterized and can only be applied in the case of “haploid” RH cell lines. Therefore, other simpler models have been proposed: the “equal retention probability model”, the “centromeric or telomeric retention model”, the “left-endpoint model”, and the “selected locus model” (Bishop and Crockford, 1992; Chakravarti and Reefer, 1992; Lawrence and Morton, 1992; Boehnke etal., 1991; Boehnke, 1992; Lunetta etal., 1996). Some of these models are nested and thus can be compared relatively to each other.

Maximization of the likelihood for the different orders needs algorithmic computation. If comparison of the likelihood’s among different possible orders permits to sort them, differences between the log-likelihoods of two consecutive orders may, however, be nonsignificant (for instance, at the significance threshold of 3). To circumvent this problem, a “framework map” can be constructed by selecting the K markers among the N ones such that the best order found has a log-likelihood superior to the one of the second order with a difference corresponding to the chosen threshold. Several software propose options to compute framework maps.

In the case of “diploid” hybrid cell lines, it is not possible (in general) to distinguish between lines having one copy of each marker and lines having two copies of the marker (each one from one of the two chromosome homologs of the donor cell). Thus, for parametric models, likelihood computation requires a hidden Markov chain algorithm (Lange et al., 1995). Nevertheless, in most cases, analysis of data from a “diploid” RH panel using haploid models does not seem to introduce differences in the best final order found (Ben-Dor et al., 2000) except a small underestimation of the distances between markers (about 5%).

Finally, when considering N markers, there are N !/2 possible orders to explore. Thus, whichever is the model chosen, evaluating all these orders to find the best one becomes quickly impossible. Some algorithms derived from combinatorial optimization under constraint (here either the number of OCB or the likelihood according to the model) were developed to decrease the number of orders to explore. A first class of methods based on the complete set of markers was proposed such as the “branch and bound”, “stepwise locus ordering”, or “simulated annealing” method (Nijenhuis and Wilf, 1978; Kirkpatrick etal., 1983; Barker etal., 1987). Recently, Ben-Dor etal. (2000) developed an algorithm based on the “Travelling Salesman Problem” (Garey and Johnson, 1979; Cormen, 1990). All these methods are heuristic and except for the ( branch and bound } method, which is computationally intensive and practically impossible for a large set of markers (N >10), they do not guarantee that the best order will be found. Other heuristic (or metaheuristic) methods were proposed to try to improve a given order and to test if it is not suboptimal such as the “flip algorithm”, “Tabu search”, or “genetic algorithm” (Glover, 1986; Hansen, 1986; Holland, 1973; Barker etal., 1987).

In the end, once the best order is found, it is still possible to check the data by identifying unlikely recombinants, which can reveal a genotyping mistake. This type of a posteriori verification must however be undertaken carefully to avoid bias data.

3.5. Software

Several software packages are publicly available, and differ by the optimization algorithm or the options proposed. The most frequently used are:

• RHMAP (http://www.spn.umich.edu/group/statgen/software) contains three different programs, which can be freely downloaded: RH2PT (two-point analysis), RHMINBRK (order determination by a nonparametric approach), and RHMAXLIK (order determination by a parametric approach with almost all the models available).

• RHMAPPER (http://www.genome.wi.mit.edu/ftp/pub/software/rhmapper), which can also be freely downloaded. It uses a parametric strategy and a hidden Markov model to perform maximum likelihood calculations on multipoint maps (either on “haploid” or “diploid” panels). It is particularly suitable for large-scale mapping projects.

• RHO (http://www.cs.technion.ac.il/Labs/cbl/research.html), which should be used via a web interface. It uses the heuristic described in Ben-Dor et al. (2000), which can be applied on either parametric or nonparametric models.

• Carthagene (www.inra.fr/bia/T/CarthaGene/), which can be freely downloaded. This software is very user friendly and proposes many different construction procedures. It uses a parametric model with calculation times greatly decreased by an improvement of the EM algorithm (Schiex et al., 2002). However, like RHO it assumes that the panel is “haploid”, but this does not seem to have a great influence on the reliability of the orders found (see above).

4. Advantages of RH mapping

RH mapping methodologies have met great success in many different species. This can be explained by several advantages as compared with classical linkage mapping strategies.

First, a panel consisting of 100 RH cell lines is in most cases sufficient to offer a good representation of the genome of interest, and its resolution (understood as the threshold at which two closely related markers will be distinguished) is higher than that of linkage mapping analysis. Indeed, in linkage mapping, the resolution is directly dependent on the number of informative meiosis in the pedigree analyzed (assuming an average correspondence of 1 cM and 1 Mb, to distinguish two markers, 1 Mb apart, it is necessary to have in theory 100 informative meiosis). If some markers are not informative in all the families of the pedigree analyzed, the number of individuals needed can therefore exceed greatly the number of informative meiosis wanted. Thus, the size of the experiment very often limits the resolution to the order of the centimorgans. In contrast and as explained above, a control of the irradiation dose and to a lower extent, the number of lines of the RH panel make it possible to achieve a very fine resolution (up to less than 100 or 50 kb), the limit soon being represented by the number of markers available. Additionally, as far as we know, no hot or cold spot of X-induced breakages have been reported in genome-wide studies. As such, RH distances appear to be closely related with actual physical distances, and RH mapping is thus generally considered as a physical mapping method.

In practice, an RH panel is screened using several types of molecular methods: enzymatic expression analysis, probe hybridization, or more frequently PCR screening, which is easy and fast. One big advantage is that markers do not have to be polymorphic and hence all kinds of STSs (single sequence tags) and, in particular, coding sequences such as ESTs (expressed sequence tags) can be easily mapped on a broad scale. Additionally, detailed maps can be built for the non-pseudoautosomal part of sexual chromosomes such as the Y chromosome in mammals (Liu etal., 2002). These characteristics make it possible to use RH panel as an efficient and powerful tool to draw comparative maps through the use of comparative anchoring markers (O’Brien et al., 1993; Yang and Womack, 1998; Everts-van der Wind et al., 2004). The main principle is to choose markers in coding regions, permitting an easier identification of orthologies among genomes when the whole-genome sequence has not been sequenced. The only limiting factor of RH mapping is that if the species of the donor cells is closely related to the species of the receptor cells, the genome of the receptor cell may interact with the chosen probe. This can be avoided either by defining probes in the untranslated region of a gene, which has a lower level of sequence similarity (Wilcox et al., 1991) or in the case of PCR-based probes, by amplifying introns (using exonic probes) to generate an interspecific polymorphism (the intron length being less conserved). It is also possible to use more complex detection methods such as SSCP “single-strand conformation polymorphism”, but then it starts to become more time consuming and labor intensive.

Finally, RH mapping constitutes an efficient tool to speed up positional cloning of genes affecting traits of interest, particularly if the sequence of the genome considered is not yet available or if only limited genome coverage is available (up to 2x coverage). RH mapping makes it possible to integrate both linkage and comparative maps thus to exploit efficiently both information sources. Indeed, the linkage mapping-related methodology permits the identification of genomic regions involved in the genetic determinism of the variation of a trait (QTL) using molecular markers and phenotypic information recorded on individuals of a given pedigree. An RH map including both markers used in linkage maps and comparative anchoring markers will permit anchorage of the identified genomic region on the genome of a different reference species and thus to benefit from the functional (identification of putative candidate genes) or, more generally, the mapping information available for this species.

5. Conclusion

Even if the completion of several whole-genome sequences appears to challenge the use of RH panels for positional cloning experiments, it should be noted that they have played an active and important part in the history and success of these different genome projects, for example in man (Olivier etal., 2001) or recently, in rat (Kwitek et al., 2004). Indeed, with its high mapping resolution, contigs sequence assembly can be greatly speeded up or some inconsistencies resolved by screening on a given panel some relevant markers (from BAC end sequences or EST for instance).

Moreover, for most species for which no whole-genome sequence is available, the construction and use of an RH panel constitute a powerful tool for positional cloning strategies and more generally for making progress in genomic approaches. Notably, it produces a very fine resolution intermediate between that of linkage maps and BAC-based physical maps. The ease of screening and the fact that markers do not need to be polymorphic make it possible to develop fine whole-genome comparative maps. Thus, the species of interest can benefit from another better-characterized species to quickly build a physical map (Murphy et al., 2001). The only remaining limit is the number of available genetic markers, but the resolution of the panel can be adjusted by controlling the irradiation dose.

Next post:

Previous post: