Microarray comparative genome hybridization (Genomics)

1. Introduction and history

Comparative genome hybridization (CGH) is a method for genome-wide detection of chromosomal differences (see Article 11, Human cytogenetics and human chromosome abnormalities, Volume 1) between a sample and control that are due to DNA copy number changes. Briefly, total genomic DNA from a “test” and a “reference” individual are labeled with different fluorescent dyes and cohybridized to a representation of the genome in the presence of CoT-1 DNA (At a given temperature, the rate of DNA renaturation depends on concentration (Co) and time (t). CoT-1 DNA represents a rapidly reassociating and thus a highly repeat-enriched fraction of genomic DNA. It is typically derived by denaturing sheared gDNA at a concentration of 3 mM, reassociating for 5.5 min, and then isolating the reassociated double-stranded product.), which is used to block repetitive sequence. The ratio of signals emitted from different loci provides a map of variation in copy number in the genome of the “test” individual. Originally, metaphase chromosome spreads were used as the genome representation (Kallioniemi etal., 1992) for CGH, and in this format the technique has been widely used (Nacheva etal., 1998; Brown and Botstein, 1999; James, 1999; Weiss etal., 1999; Ness etal., 2002; Albertson and Pinkel, 2003) for the analysis of tumors (see Article 14, Acquired chromosome abnormalities: the cytogenetics of cancer, Volume 1) and developmental abnormalities such as mental retardation and congenital anomalies. A number of experimental and analytical modifications (see Article 58, CGH data analysis, Volume 7) have been proposed to increase the resolution, such as standard reference intervals (Kirchhoff et al., 1999), and precision, such as four-color CGH (Karhu etal., 1999). Microarray comparative genome hybridization (maCGH) represents an evolution of the classical method, whereby chromosome spreads are replaced by DNA fragments of known genomic location spotted on a microarray slide. There are several variations of the maCGH format, in which either BACs (Bacterial Artificial Chromosomes), cDNAs, or oligonucleotides are used as the DNA target. Regardless of format, maCGH offers distinct advantages over both classical CGH and other microscopic cytogenetic methods. The resolving power of maCGH is considerably greater than the maximum of approximately


5 Mb achievable by G-banding or 1-2 Mb (amplifications) and 10 Mb (deletions) by conventional CGH (Bentz et al., 1998; Kirchhoff et al., 1999). The only theoretical limits on resolution are the number, size, and sampling density of the targets on the array. Further, the method is more scalable than microscopic methods, allowing the parallel and quantitative (Moore et al., 1997; Kirchhoff et al., 1998; Quackenbush, 2002; Geller et al., 2003) evaluation of large numbers of samples, and does not require intact chromosomes for analysis. The most significant limitations of maCGH are (1) the genomic location of amplified DNA sequence is not known and (2) unless chromosomes are first separated by flow cytometry then labeled and hybridized individually (a process called array painting (Fiegler et al., 2003a)), the assay is blind to chromosomal aberrations that do not result in copy number changes, such as balanced translocations. Nonetheless, maCGH has proven its utility through the detection of DNA copy number changes in tumors (Albertson et al., 2000; Bruder et al., 2001; Struski et al., 2002; Nakao et al., 2004; Cai et al., 2001; Zhao et al., 2004), children with mental retardation and various dysmorphic syndromes (Shaw-Smith et al., 2004; Veltman et al., 2002; Xu and Chen, 2003; Yu et al., 1997), and molecular evolution (Locke et al., 2003).

2. BAC arrays

Presently, the construction of genomic microarrays is dominated by the use of BACs as the target for hybridization. Several of the BAC libraries that provided critical positional information for guiding sequencing and assembly of the human genome (Lander etal., 2001; Venter etal., 2001) are available for array construction. Initial use of these resources provided first-generation arrays with approximately 1-Mb resolution (Snijders etal., 2001; Fiegler etal., 2003b). Recently, an optimal tiling set of clones providing coverage for the entire genome has been selected (Krzywinski et al., 2004) and a high-resolution BAC microarray has been manufactured using these clones (Ishkanian et al., 2004). Theoretical resolution of this clone set is based on the degree of clone overlap, and is calculated to be 75 kb.

BACs are desirable as hybridization targets not only because their genomic positions are known accurately but also because their large insert size (approximately 150-200 kb on average) allows integration of hybridization signal over a comparatively large region and gives sufficient sensitivity to routinely detect single copy number changes starting with only a few hundred nanograms of labeled test DNA (Albertson and Pinkel, 2003). Preparation and spotting of BACs on to arrays is made difficult by the low yield of DNA from BAC cultures and the large molecular weight of the DNA. Both factors are a detriment to handling DNA at the high concentration necessary for achieving good signal-to-noise ratio in hybridizations. These problems have been overcome by preparing a representation of each BAC clone by ligation-mediated PCR (LMPCR), whereby clones are fragmented, oligonucleotide adapters are ligated to the ends of fragments, and the fragments are amplified by PCR using adapter-specific primers. In this manner, a large and renewable quantity of DNA suitable for array printing is generated. LMPCR was the first reported technique for the preparation of clones for maCGH and ratio data obtained from arrays composed of LMPCR BAC representations have been shown to be essentially identical to ratios reported on intact DNA from the same BACs (Pinkel etal., 1998). Degenerate oligo primed PCR (DOP-PCR) (Fiegler et al., 2003b) and rolling circle amplification (RCA) (Smirnov et al., 2004; Buckley et al., 2002) have also been successfully used in the preparation of BAC DNA for spotting. The principal drawbacks of BAC arrays include the ultimate limit of resolution determined by their large insert size and the continued necessity for using large amounts of CoT-1 DNA to block highly repetitive sequences (although numerical methods exist to mitigate this effect (Kirchhoff et al., 1997)). Further complications from repeat elements arise in telomeric and pericentromeric regions. While these regions often contain loci of interest, they are highly repetitive and therefore masked by CoT-1 DNA. Care must also be taken to avoid being misled by low-copy-repeat elements that are not masked by CoT-1 DNA. It is estimated that 5% of the human genome is made up of interspersed duplications (see Article 26, Segmental duplications and the human genome, Volume 3) (Eichler, 2001; Bailey etal., 2002) that represent, for example, homology between gene families, and these naturally occurring duplications can confound analysis of BAC maCGH data.

3. cDNA arrays

The use of cDNA clones as the target for hybridization in maCGH (Pollack et al., 1999; Kargul et al., 2001; VanBuren et al., 2002; Yamamoto et al., 2002) has obvious advantages in terms of the number and variety of clone sets and prefabricated arrays available for human studies, but also for studies of other model organisms, pathogens, disease vectors, novel therapeutics, and organisms of industrial importance for which no genome sequence or validated large insert genomic clone set is yet available. While CGH using cDNA arrays is informative only for coding sequence, concentrating resolving power on this fraction of the genome can be considered an advantage, particularly when gDNA and RNA are available from the same individual, allowing cointerrogation of gene dosage and gene expression at precisely the same loci. Information on copy number changes in gene regulatory regions or other nontranscribed regions may be missed, but the same is true for some of the current generation BAC arrays that do not offer complete genome coverage. The principal drawback in using cDNA clones as hybridization targets is limited sensitivity. Relatively large amounts of labeled DNA (up to 10 |ig) are required for each hybridization and the resulting signal must be averaged over a number of clones to define local copy number (Pollack etal., 1999). While large genomic amplifications are readily detectable, cDNA arrays are generally not considered to be the best tool for detection of single copy number differences.

4. Oligonucleotide arrays

Two recent developments in the application of oligonucleotide arrays to DNA copy number analysis have shown promise: representational oligonucleotide microarray analysis (ROMA) (Lucito et al., 2003; Sebat et al., 2004) and the use of Affymetrix SNP chips. ROMA is an interesting approach enabled entirely by completion of the reference human genome sequence. In ROMA, a representation of the genome sequence is prepared by digesting gDNA with a restriction enzyme (BglII) and fragments are amplified using the same basic procedure as LMPCR, described above. ROMA arrays are spotted with oligonucleotides (70mers) that are designated to have near-homogeneous annealing characteristics, and match unique (nonrepetitive) sequence present within computationally defined Bglll fragments. Thus, the target sequence on the array is repeat-free, obviating the need for CoT-1 DNA as a blocking agent. The reduced complexity of the target and probe fractions improves signal-to-noise performance and reduces the amount of sample required for hybridization. In principle, the resolution is very high, but in practice a finite number (approximately 120 000) of repeat-free Bglll fragment 70mers in the human genome places an upper limit on resolution. Resolution could be increased further, in theory, by digesting with more than one restriction enzyme, and in time, different restriction enzymes and enzyme combinations will likely be found that give an optimal number and spacing of targets. An 85 000-element ROMA array has been characterized (Lucito et al., 2003) and has been shown to be capable of detecting both known and novel single and multiple copy deletions and amplifications, including several less than 100 kb in length. The practice of ROMA is restricted to organisms that have a quality whole-genome sequence available, and the array design steps are demanding, but this approach holds much promise.

Single nucleotide polymorphism (SNP) have recently been developed by Affymetrix for array-based genotyping (Kennedy et al., 2003), and these arrays have also shown some promise as a platform for evaluation of DNA copy number (Zhao et al., 2004; Bignell et al., 2004). These arrays contain allele-specific 25mer oligonucleotide probes complementary to SNPs predicted to be in the fraction of the genome represented by the digestion fragments generated by the enzyme used in sample preparation (typically Xba1 or Hindlll). Current arrays formats contain up to 100 000 SNPs and provide resolution as low as approximately 30 kb. Multiple different oligonucleotide probes cover each polymorphic site on both the sense and antisense strand. Like ROMA, preparation of sample DNA for hybridization relies on LMPCR for sample complexity reduction and amplification, the difference being that for the Affymetrix experiments, Xbal or Hindlll, rather than Bglll, is the enzyme used for digestion of sample gDNA. It is important to note that the use of SNP arrays for DNA copy number evaluation is fundamentally different from ROMA, BAC or cDNA maCGH in two regards. First, the SNP chip assay does not rely on comparative hybridization. Rather, each sample DNA is individually labeled and hybridized. Copy number differences are detectable only in comparison to reference DNA samples evaluated in separate experiments and stored in a database provided by Affymetrix. Second, because alleles are present as separate array elements, the SNP chip platform uniquely enables loss of heterozygosity events that are caused by hemizygous deletion to be distinguished from those that are caused by copy number neutral events, such as deletion followed by subsequent duplication of the remaining locus. Loss of heterozygosity (LOH) is common in cancer cells (Vogelstein and Kinzler, 1998), where many tumor suppressor genes are inactivated by mutation in one allele and hemizygous deletion of the other wild-type allele. However, other LOH mechanisms such as mitotic recombination or gene conversion do not lead to copy number changes, and it is important to be able to distinguish between these mechanisms. Further, genetic deletion syndromes such as Angelman’s syndrome have different outcomes depending on whether the deleted allele is maternally or paternally inherited. Thus, the ability to distinguish parent-of-origin effects has implications for genetic diagnosis and counseling.

The present Affymetrix SNP chips detect homozygous deletions, hemizygous deletions, and amplifications simultaneously with LOH detection (Zhao et al., 2004; Bignell et al., 2004). Direct comparison with BAC and cDNA array analysis (Zhao et al., 2004) showed that the three platforms gave generally comparable copy number results, although the noise of individual measurements was greater on the SNP chip platform, and analysis of raw data using a Hidden Markov Model (see Article 98, Hidden Markov models and neural networks, Volume 8) was necessary to obtain the best inference of copy number. As with ROMA, high target density is possible with this approach, and a 100 000-element Xbal-based SNP array is presently under construction, which will likely prove to be a very powerful and useful tool.

5. Experimental considerations

While the platforms described above (BAC CGH, cDNA CGH, ROMA, Affymetrix SNP chips) have all been shown to have utility in evaluation of DNA copy number, there are several additional issues to be considered when investigating DNA copy number aberrations. These include the amount, integrity and source of sample and reference DNA, the sensitivity of the assay, and the prevalence of copy number polymorphisms. Regarding input sample and reference DNA, BAC array CGH and the ROMA-based methods require several hundred nanograms of genomic DNA per hybridization, and cDNA arrays require considerably more. While several hundred micrograms of DNA seem a modest amount, even this quantity can be difficult to obtain from clinical samples, particularly from microdissected tissue, or from postmortem tissue where DNA may have degraded. Regarding reference DNA, it is desirable, if not essential, to use the same reference DNA within a series of hybridizations from the same study, or to compare across studies. Thus, there is a need for a large repository of reference DNA in laboratories performing this assay. Ideally, this would be constitutional DNA from a single donor with a defined karyotype, but because this is impractical, pooled DNA from multiple individuals of the same gender (available commercially form Clontech or Novagen) is often used as reference. With sufficient numbers of individuals represented in a DNA pool, any individual karyotypic anomalies become negligible. While in principle DNA from selected cell lines would offer a renewable source of reference DNA, the prevalence of karyotypic anomalies in immortalized cells may make results difficult to interpret, and caution is advised. Of note, recent success in amplifying sample and reference DNA using Phi29 polymerase or Bst polymerase suggests that this approach may provide a practical solution where input DNA is limiting for CGH experiments. Initial results show limited representational bias and background amplification if experimental conditions are carefully controlled (Lage et al., 2003). Even with appropriate input sample and reference DNA, a common observation is that maCGH generally does not achieve theoretical values for copy number differences (Albertson and Pinkel, 2003). For example, female test versus male reference comparisons typically give less than the expected 3:2 ratio for X-chromosome probes and measurable signal for the presumably absent Y chromosome. The reasons for dynamic range suppression are poorly understood, but may relate to the presence of somatic mosaicism (see Article 18, Mosaicism, Volume 1) (as in the case of tumor samples contaminated with surrounding stromal cells of normal karyotype) or, for clone-based arrays, deletion or insertion events that span less than the length of the target cDNA or BAC. Incomplete suppression of repetitive sequence may also be implicated. It is for these reasons that independent verification of all putative copy number changes by a second method such as FISH (see Article 22, FISH, Volume 1) or quantitative real-time PCR remains essential.

Copy number polymorphisms (CNPs) have the potential to confound CGH analysis. While there are only a small number of well documented CNPs in the human population, such as the Rh locus (Wagner and Flegel, 2000), the CYP2D6 locus (Meyer and Zanger, 1997), and the green color pigment locus (Nathans etal., 1986), as maCGH becomes broadly applied it is becoming clear that CNPs are not uncommon, and represent an important source of genetic variation (Sebat etal., 2004; Iafrate etal., 2004). DNA copy number variation is clearly a hallmark of tumor cells (Vogelstein and Kinzler, 1998), and there is also evidence that substantial levels of chromosomal anueploidy may exist in neurons (Rehen et al., 2001). Because many or perhaps most CNPs may be benign, a survey of common polymorphisms in different ethnic populations would provide a valuable resource for interpreting disease-focused CGH studies. Presently, however, owing to the unknown scope of copy number polymorphism, it is important that studies investigating CNP disease associations include an appropriate group of matched control individuals.

Next post:

Previous post: