YAC-STS content mapping (Genomics)

1. STS marker

Sequence tagged site (STS) markers represent short unique DNA sequences for which a specific PCR (polymerase chain reaction) assay can be designed, so that any DNA sample can be easily tested for the presence or absence of this specific DNA fragment. One of the hallmarks of an STS is that it maps to a single location in the genome. STSs were proposed as a “common language for physical mapping” (Olson et al., 1989), and have emerged as principal markers used in a variety of mammalian maps (see Article 9, Genome mapping overview, Volume 3). The principle of STS mapping is to create clone maps based on STS content, allowing the correlation between different maps.

A YAC (yeast artificial chromosome)-based STS content mapping approach was first used to construct a map of human chromosome 7 (Green et al., 1991) in 1991. The proposed strategy was based on the mapping of large segments of human DNA using YACs as the source of cloned DNA and STSs as landmarks to order these clones and anchor them with radiation hybrid maps. The principle disadvantage of any STS marker approach – compared to hybridization-based markers – lies in the cost of acquiring single-copy sequences and synthesizing oligonucleotides specific to each STS.

2. STS marker development

PCR-based STS content mapping is a highly robust and straightforward technique. It can be applied to all kinds of cloning systems and mapping panels. The scale of STS development can vary from creating a single STS corresponding to an interesting gene sequence for the purpose of locating the gene in the genome, to creating a large number of STSs from a particular chromosome or subchromosomal region as part of a physical mapping project.


Regardless of the origin of STSs or scale of a project, the major steps of STS development are:

1. acquisition of sequence

2. assessment of sequence by comparison of databases

3. selection of PCR primers

4. development of a PCR assay

5. assessment of the uniqueness of an STS by limited mapping

6. assessment of STS quality.

Many genomes contain abundant repetitive sequences, including interspersed repetitive repeats, satellite sequences, and gene families that should be avoided in the development of an STS. Sequences can be compared to databases of known repeat sequences, for example, by using the web-accessible program RepeatMasker (Smit etal., 1996-2004).

The primer design should be done on unique sequences. Several algorithms are available for selecting oligonucleotides for PCR (e.g., PRIMER, OLIGO). PCR products that are used to detect an STS are usually 100-1000 bp size, with an average size of approximately 200 bp. If multiplex PCR is used to score multiple STSs simultaneously, a range of product sizes should be selected to facilitate resolution of products from the reaction mixture.

When primers are selected from a cDNA sequence, it is important to select oligonucleotides that are likely to lie within a single exon, to ensure that genomic DNA can serve as a template for the amplification of the STS. Since PCR-based STS content mapping demands a primer design and an oligonucleotide for each marker studied, the costs of a project are increased. Furthermore, PCR has to be done on every clone of a library, followed by the detection of products, which results in a large number of reactions. For screening whole libraries, the collection of clones in pools can circumvent this problem and reduces drastically the number of PCR reactions to be done (see Figure 1).

The next step in STS generation includes the development of a robust PCR assay that can be carried out under standardized reaction conditions. A robust and standardized assay is especially important for projects that involve the generation of more than a small number of STSs because a large number of PCR assays will be carried out.

As an alternative to PCR, genomic libraries can be spotted on high-density grid-ded filters and several thousands of clones can be screened in a single hybridization step (see Section 4.2).

3. Large-insert clone libraries

With the availability of one or more closely linked DNA markers from a genomic region of interest, one can begin to develop a contig of overlapping clones that spans the region. A cloned contig not only provides information on physical distances but can also be used as the raw material from which positional cloning of a phenotyp-ically defined locus can proceed. The generation of a contig is pursued most efficiently by screening a large-insert genomic library. Although a number of systems for generating large-insert libraries have been described, to date, bacterial artificial chromosomes (BAC) and YAC cloning systems have been used most widely.

The development of YAC cloning technology, first implemented by David Burke and Maynard Olson at Washington University in St. Louis (Burke et al., 1987), has enabled the cloning of very large fragments of exogenous DNA that range in size up to 2 Mb and thus directly enhanced the relationship between genetic, physical, and functional mapping of genomes. YAC cloning systems are based on yeast plasmids, containing DNA sequences that function as telomeres (TEL), as well as containing yeast origin of replication (ARS) and centromere segments (CEN). “Artificial” yeast chromosomes are formed by ligation of random, large fragments of genomic DNA between two arms that contain, in one case, a telomere and a centromere, and in the other case, a telomere alone, with selectable drug-resistance markers on both arms. These YAC constructs are transfected back into yeast where they will move alongside host chromosomes into both daughter cells at each mitotic division.

Three-dimensional pooling of clone libraries: A YAC library is usually stored in 96-well plates: one stack consists of 8 plate pools, 8 row pools, and 12 column pools, and PAC/BAC libraries are stored in 384-well plates: one stack consists of 8 plate pools, 16 row pools, and 24 column pools. By pooling clone libraries, the number of PCRs for the library screening is significantly reduced

Figure 1 Three-dimensional pooling of clone libraries: A YAC library is usually stored in 96-well plates: one stack consists of 8 plate pools, 8 row pools, and 12 column pools, and PAC/BAC libraries are stored in 384-well plates: one stack consists of 8 plate pools, 16 row pools, and 24 column pools. By pooling clone libraries, the number of PCRs for the library screening is significantly reduced

The construction of a YAC library proceeds in a manner that is very different from that of most other types of genomic libraries. Every clone in the library must be picked individually and placed into a separate compartment (e.g., of a microtiter dish). This process is extremely time consuming and labor intensive, but once a library has been formed with individual clones in individual wells, it is essentially immortal. For this reason and others, it makes good sense to screen established libraries for a gene of interest rather than to create a new library.

The first human YAC library to be described had a 2.2-fold genomic coverage and an average insert size of ~265kb, and was distributed freely to the entire scientific community (Burke et al., 1987).

Although YAC clones have facilitated the construction of long-range physical maps, it should be mentioned that the YAC cloning system is not perfect. A percentage of clones within a YAC library are chimeric; that is, their inserts are composed of two or more unrelated genomic fragments that have become ligated together as an artifact of the cloning process. The preidentification of chimeric clones is essential before one can begin to generate a physical map. The disadvantages (e.g., the chimerism and the instability) can limit the utility of YAC libraries and restrict their purposes.

Two other systems for cloning large genomic inserts have been described more recently, which offer high clonal stability, reduced cloning biases, and are easily purified for DNA sequencing: the PAC (bacteriophage P1-derived artificial chromosomes) (Ioannou et al., 1994) and BAC (bacterial artificial chromosomes) (Shizuya and Kouros-Mehr, 2001).

The PAC system is based on the use of the bacteriophage P1 as a cloning vector (Pierce et al., 1992; Pierce and Sternberg, 1992). This system has been used to obtain a mouse genomic library with average inserts in the range of 75-95 kb with a maximum cloning capacity of 100 kb. The P1 cloning system has two advantages over YACs: first, it has much more efficient cloning rates, and second, like other bacterial cloning systems, it allows the efficient purification of large amounts of clone DNA away from the rest of the bacterial genome. The utility of this cloning system in the analysis of genomic organization within the H2 region has been demonstrated (Gasser et al., 1994).

The BAC system (bacterial artificial chromosome) is derived from the well-studied E. coli F factor, which is essentially a naturally occurring single-copy plasmid (Shizuya et al., 1992). This plasmid has been converted into a vector that allows the cloning of inserts with more than 300 kb of DNA, and with a reported average size range of 200-300 kb. The BAC system has the same advantages as P1 and the added advantage of a larger potential insert size.

4. Physical mapping

Physical mapping is a means to map genes or DNA sequences on a chromosome without relying on meiotic segregation (genetic mapping). Physical maps provide molecular access to chromosomal regions of interest and therefore facilitate the positional cloning of genes. Ideally, the series of clones should be gap-free and form a contiguous clone array. For the analysis of a specific chromosomal region, like QTL regions, a high-resolution physical map is desirable. A series of identified clones are assembled to provide a full representation of the region of interest.

Common physical mapping strategies are based on the PCR screening of large-insert cloning libraries using appropriate markers. An extremely powerful and convenient tool for ordering sequences according to their chromosomal position is a radiation hybrid panel (see Article 14, The construction and use of radiation hybrid maps in genomic research, Volume 3). The localization of any STS can be achieved by PCR amplification and scoring of a particular radiation hybrid panel. The obtained patterns are compared with patterns of previously mapped markers held on a central server.

4.1. PCR-based STS mapping

STS content mapping involves scoring a series of clones for the presence or absence of particular STSs. STS content mapping is often used as a tool to assemble clone contigs. Large-insert genomic libraries (e.g., YAC libraries) are usually screened by PCR to amplify specific sequences contained within the genomic clones. These STSs serve as unique identifiers to ascertain whether the DNA fragments are located within other genomic clones to establish an overlap between clones. The number of PCRs for library screening can be significantly reduced by pooling YAC clones (see Figure 1).

Chromosomal walking by clone-clone hybridization is not practically feasible with mammalian YACs: the large amount of repetitive DNA in the inserts means that blocking of the repetitive DNA signal during hybridization is technically difficult. Instead, techniques are used to recover short-end fragments from individual YACs, for example, by restriction enzyme digestion or PCR amplification. Usually, the YAC DNA is cleaved with a restriction enzyme that is known to cleave the YAC vector sequence. Among the cleavage products, there will be fragments containing both the unknown terminal sequence from the insert DNA and the adjacent known vector sequence. Such sequences can be amplified using various PCR-based methods by using a primer binding to the vector sequence to permit access to an adjacent uncharacterized sequence (e.g., Inverse-PCR, Vectorette PCR).

YAC insert end fragments can then be used as hybridization probes to screen colony filters from a YAC library,or are sequenced to design oligonucleotide primers to permit a PCR assay for this sequence. PCR can then be used again to screen pooled libraries for identifying YAC with overlapping sequences. All positive clones from a YAC, or other large-insert libraries can be sized by PFGE, and fragments at both ends of each insert can be isolated rapidly by several standard protocols (Riley et al., 1990; Cox et al., 1993). End fragments from each clone should be used as probes to perform an initial test of the possibility of chimerism. This can be accomplished by probing appropriate somatic cell hybrid lines to determine whether both ends map to the same chromosome as the original DNA marker used to isolate the clone; if appropriate somatic cell hybrid lines are not available, one can also test the segregation of the end fragments on a panel of 20 interspecific (or intersubspecific) backcross samples. If the two end fragments show complete concordance in transmission, this can be taken as strong evidence for nonchimerism; in contrast, two or more recombination events would be highly suggestive of a chimeric clone. Chimeric clones need not be discarded; it is just necessary to be aware of their nature in any interpretation of the data that they generate.

The process of deriving YAC clones from a library can be brought to a halt when the clones that have already been obtained include the locus being sought.

It is only possible to reach this conclusion when the derived contig extends over markers that map apart from the locus on both of its sides. In other words, the contig must extend across the two closest recombination break points that define the outer limits of localization. If cloning is begun with a very dense map of markers placed onto a high-resolution cross, this endpoint is likely to be reached more quickly. With real luck, it might even be reached with the first set of YACs obtained in the initial screening of the library.

The interrepeat or interspersed repetitive sequence (IRS)-PCR is another means of isolating sequences that are specific to a chromosomal region of interest, for example, using somatic cell hybrids. IRS-PCR is an exceptional type of PCR in which genomic sequences located between two highly repetitive SINE elements are amplified (Ledbetter et al., 1990) (see Figure 2). The mammalian genome contains a large amount of highly repeated DNA sequence families, which are largely transcriptionally inactive. A wide variety of different repeats are known and they are classified into two major types of organization: tandemly repeated and interspersed repeated sequences. Tandemly repeated noncoding DNA is mainly grouped into three subclasses depending on the average size: satellite, mini-, and microsatellite DNA. Interspersed repetitive noncoding DNA sequences are not clustered, but are dispersed and a number of different classes exist. Most of the DNA families contain some members that are capable of undergoing retrotransposition (Deininger, 1989; Daniels and Deininger, 1985). Two major classes have been discerned on the basis of repeat unit length: SINEs (short interspersed nuclear elements) with an average size of 0.1 to 0.3 kb and LINEs (long interspersed nuclear elements) with an average size between 0.3 and 8 kb. The Alu element, with more than one million copies, is the most abundant SINE in humans. Because of the high copy number, the Alu gene family comprises more than 10% of the human genome (Lander et al., 2001).

IRS-PCR strategies have been applied to various species using the SINE sequences in human (Alu repeat) (Nelson et al., 1991), mouse (B1 repeat) (Hunter etal., 1996; McCarthy et al., 1995), rat (ID repeat) (Gosele et al., 2000), and zebrafish (DANA/mermaid repeat) (Shimoda etal., 1996). Since these SINE elements are present at a very high frequency in the respective genome, two of such repeats will often be found in close proximity and sometimes in opposite orientations. A single primer corresponding to a sequence close to the end of the repeat consensus sequence can bind to each of two closely located, oppositely orientated repeat sequences (Nelson et al., 1989). IRS-PCR can be applied on any genomic DNA template. If the starting DNA is complex (e.g., genomic DNA, cell hybrid DNA), the IRS-PCR will generate a series of resolved by conventional agarose gel electrophoresis. The size of amplified IRS-PCR products ranges from 200 to 2000 bp. If IRS-PCR is carried out on low-complexity templates (e.g., YAC, PAC or BAC clones), one or a few products will be generated, which can directly be exploited as markers (Schalkwyk et al., 2001). The majority of IRS probes consist of unique single-copy DNA sequences (except for incorporated primers). Therefore, IRS-PCR products from large-insert genomic clones can be compared with equivalent products from other library clones to check for overlapping clones based on IRS amplification products in common, and also used as hybridization probes for screening large-insert clone libraries (see Section 4.2). The generation of large numbers of IRS markers is not only rapid but also cost-efficient, because there is no requirement to sequence markers or to design locus-specific primers. bands that cannot be

IRS-PCR permits amplification of DNA sequences located between two closely positioned but oppositely orientated repeat elements. A single primer corresponding to a sequence close to the end of the repeat consensus sequence can bind to each of two closely located, oppositely orientated repeat sequences. Amplification is carried out with a single primer complementary to the 5'-3' repetitive element

Figure 2 IRS-PCR permits amplification of DNA sequences located between two closely positioned but oppositely orientated repeat elements. A single primer corresponding to a sequence close to the end of the repeat consensus sequence can bind to each of two closely located, oppositely orientated repeat sequences. Amplification is carried out with a single primer complementary to the 5′-3′ repetitive element

4.2. Hybridization-based STS mapping

Once YAC clones are obtained by PCR-based screening of YAC libraries, YAC end fragments can be isolated as described above. YAC insert end sequences can then be used as hybridization probes to screen colony filters from a YAC, PAC, or BAC library to identify adjacent clones.

An alternative strategy for integrated physical and genetic mapping is based on the interrepeat or interspersed repetitive sequence (IRS)-PCR system, which enables a high-throughput screening of large numbers of clones in a time and cost-efficient way by hybridization.

The IRS-PCR-based physical mapping strategy relies on the ability to detect clone overlaps by hybridization of an individual IRS-PCR product to IRS-PCR product pool filter. Therefore, coordinate pools that define a library are amplified with the repetitive element primer. These IRS-PCR pool products are spotted onto nylon membranes in ordered arrays. Individual IRS-PCR products of clones (previously mapped) are utilized as hybridization probes to identify overlapping clones. Overlapping clones are then amplified again via IRS-PCR and PCR products are used as hybridization probes (see Figure 3). Clone contigs are built by repeating these rounds of PCR and hybridization. Chromosome walking based on IRS-PCR and hybridization is bidirectional and, therefore, highly efficient.

Schematic representation of the IRS-PCR-based physical mapping strategy for the construction of an integrated radiation hybrid and physical map

Figure 3 Schematic representation of the IRS-PCR-based physical mapping strategy for the construction of an integrated radiation hybrid and physical map

The main advantage is that this approach permits the simultaneous screening of multiple probes and/or libraries in one hybridization step, avoiding running of gels, sequencing, and primer design. This technology was utilized for the construction of clone contigs in mouse (Hunter etal., 1994), physical mapping of the mouse (Hunter etal., 1996; McCarthy etal., 1995; Schalkwyk etal., 2001) and the rat genome (Krzywinski etal., 2004). For the physical mapping of the rat genome, two mapping methods were combined to gain information about the proximity of marker loci: YAC libraries were screened by hybridization-based assays to identify clones containing a given locus. Nearby loci tend to be present in many of the same clones, allowing proximity to be inferred. Marker-content linkage can be detected over distances of about 800 kb, given the average insert size of the YAC library used (Figure 4).

YAC contig of a region of the rat chromosome 10, constructed by IRS-PCR-based physical mapping

Figure 4 YAC contig of a region of the rat chromosome 10, constructed by IRS-PCR-based physical mapping

Hybrid cell lines, each containing many chromosomal fragments produced by radiation breakage, are screened to identify those hybrids that have retained a given locus. Nearby loci tend to show similar retention patterns, allowing proximity to be inferred. RH linkage can be detected for distances of about 2-3 Mb, given the average fragment size of the RH panel used. For the construction of a physical map and assembly of contigs, YAC clones with positive hybridization signals are considered. The number of YAC clones has to be pruned with considerable care toward chimeric clones, an inherent problem with any YAC library (Green et al., 1999). The final map of each chromosome can be constructed by integrating the YAC-linkage information with the known radiation hybrid map positions of the IRS markers using, for example, the co2 software package (Hudson et al., 1995). Doubly linked contigs are identified and then single linkage information is used to join doubly linked contigs known to lie nearby.

The primary goal of physical mapping is to assemble a comprehensive series of DNA clones with overlapping inserts. Clone-based laboratory methods maintain an important component to study large genomes through applications such as fluorescent in situ hybridization and comparative genomic hybridization. Physical maps provide an ordered, high-resolution, redundant clone set spanning the entire genome, which will be an important resource for the easy identification and access to clones spanning regions of interest in the relevant genome. The generation of in-depth physical maps will therefore continue to be a desired component for functional genomics.

Next post:

Previous post: