Arabidopsis (Molecular Biology)

Arabidopsis thaliana (L.) Heynh is a dicotyledonous plant that is ideally suited for molecular-genetic studies. It is generally accepted as a model for unraveling the molecular mechanisms involved in plant growth and development, biochemical pathways, cell biology, physiology, and pathogenic interactions. Minimal genomic DNA content, few repetitive DNA sequences, and small gene families account for its technical and biological simplicity. Arabidopsis is a natural diploid amenable to laboratory-scale genetic experiments. It is small (15 cm to 30 cm) and is grown at high density (100 plants per 0.5 m ) without seed contamination. It has a short life cycle (6 weeks to 3 months), high seed production by self-fertilization (up to 10,000 seeds per plant), is mutagenized easily, and with its 200 different ecotypes is a natural source of genetic variation. International coordination of Arabidopsis research has resulted in fast exchange of material, methodology, and information and in the creation of large-scale research programs, such as the genome-sequencing and expressed-sequence tag (EST) projects. The completion of the entire genome sequence was a milestone in plant biology because for the first time a molecular overview has been obtained of common and different pathways between plants and other eukaryotes. Moreover, every plant gene is now accessible for functional analysis. In the post-genomic era Arabidopsis research maintains its pioneering position in the field of plant science.


1. History

The first botanical description of Arabidopsis by Johannes Thai goes back to 1577. In 1907, Laibach studied the continuity of the chromosomes (5 per haploid genome) by using the plant, and he was the first to emphasize the advantages of the species for genetic analyses (1). In 1976, Bennett and Smith showed that Arabidopsis has the smallest nuclear DNA content of the angiosperms analyzed. In that period, extensive chemical and irradiation mutagenesis of the plant was performed (2). Koornneef et al. (3) published the first genetic map containing 76 morphological markers. In the mid eigthies, Meyerowitz and coworkers demonstrated the small size and low complexity of the genome, which provoked a general interest in using the plant as an experimental model (4). In 1988, efficient transformation methods opened the potential of transgenic research in the species (5). At the same time, saturation mutagenesis of the genome by insertion of heterologous DNA was initiated and resulted in large collections that became available to the scientific community (6). In 1989, the US National Science Foundation launched the ‘Multinational Coordinated Long Range Plan for Arabidopsis Genome Research’, steered by an international board of scientists, with the aim of promoting Arabidopsis as a model system for plants, in analogy to other models such as Drosophila melanogaster and Caenorhabditis elegans. The major achievements of this initiative were a seed and DNA stock centers (The Arabidopsis Biological Resource Center [Ohio State University, Colombus, OH, USA] and the Nottingham Arabidopsis Stock Centre [University of Nottingham, Loughborough, UK]), a database, and joint efforts for physical mapping and sequencing of the genome and for gene identification. The Arabidopsis genome sequencing project was initiated in Europe at the end of 1993, followed by an American and a Japanese initiative. This international consortium, the Arabidopsis Genome Initiative completed the entire sequence by the end of 2000 (7).

2. Classification, Geographical Distribution, and Ecotypes

Arabidopsis thaliana is an annual herb of the mustard family (Brassicaceae, previously named Cruciferae), has bisexual flowers, and is typified by a cross-shaped corolla, tetradynamous stamen (four long and two short ones), and capsular fruit (siliques). The genus Arabidopsis consists of 27 species and has been classified under a new tribe, the Arabidae, based on classical morphological and molecular phylogenetic studies. Arabidopsis is a facultative long-day plant, meaning that long days accelerate the initiation of flowering. It originated from Eurasia and North Africa, but is now a common weed in the temperate regions of the Northern Hemisphere. Its broad geographical distribution resulted in natural variation. Approximately 200 ecotypes (wild populations) have been registered. These ecotypes represent a natural source of heritable variation: for instance, differences are observed in flowering time, response to cold, fresh weight production, and pathogen resistance. Among ecotypes, a DNA sequence polymorphism of up to 1.4% in low-copy DNA has been measured. Deletions, insertions, and substitutions have occurred. Frequently used ecotypes are Columbia, Niederzenz, Wassilewskija, and the laboratory strain Landsberg erecta. The Columbia ecotype was used as standard for the genome sequence, whereas other ecotypes are preferred for mutagenization. DNA polymorphisms are exploited in F2 populations for mapping and quantitative trait locus (QTL) analyses.

3. Genome Structure and Organization

The size of the nuclear genome of Arabidopsis was estimated to be between 50 to 150 Mb as determined by reassociation kinetics (see C-Value), flow cytometry, or electron microscopy. From recent physical mapping data, this size was refined to 100 to 140 Mb, which is 3- to 400-fold smaller than that of other members of the angiosperms.

In 2000, the genome sequence of Arabidopsis was published, covering 115.4 megabases of the 125-megabases genome (7). A whole genome duplication as well as lateral gene transfer from a cyanobacterial-like ancestor appears to be part of the evolution of the Arabidopsis genome. The Arabidopsis genome consists for 80% of single- and low-copy DNA. Four classes of highly repeated DNA have been identified, which represent only 10% of the genome and are located mainly at the telomeres and around the centromeres. Ribosomal DNA accounts for 6% of the genome and is localized at the top of chromosomes 2 and 4. In addition to the genomic sequencing, more than 30,000 redundant Arabidopsis ESTs have been sequenced and submitted to public databases. Comparison of EST data with data of the genome sequencing program indicates that approximately 60% of the genes are represented by an EST.

From the sequence data, several conclusions can be drawn on gene organization and DNA composition: the gene density is one gene every 4.5 kb, the average gene length is 2 kb, genes have between 0 to 30 introns, 25,498 predicted transcripts encode proteins of approximately 11,000 families, and roughly 30% of the genes could not be assigned to functional categories. The gene families are either dispersed or in tandem arrays; long stretches (approximately 120 kb) of unique or low-copy DNA are interspersed with short stretches of a few kb of moderately repeated DNA; the GC content of the DNA is 35%, and methylation occurs in 6% of the cytosine bases.

The availability of the genome sequences of Arabidopsis and several other organisms allows comparisons at the genome level. For example, Arabidopsis, Drosophila and C. elegans have a similar amount of approximately 11,000 to 15,000 different protein types, indicating that this is the minimal number of proteins needed for a functional multicellular organism. Basic processes, such as translation, appear to be conserved across kingdoms, whereas more specialized processes use proteins that differ between plants and animals. These include membrane channels and transporters, components of signal transduction pathways, and transcription factors (8). Plants contain roughly 150 unique protein families (7).

Information on the genome project and on research in general on Arabidopsis is centralized in "The Arabidopsis Information Resource" (TAIR) database. A home-page on the Internet (http://www.arabidopsis.orq/home.html) provides a lot of information on ongoing research (genetic and physical maps, genome sequence information, links to public databases, seed and DNA stock centers, etc.).

4. Genetics and Physical Mapping

Based on metaphase staining and genetic linkage group analyses, the haploid chromosome number is 5, ranging in size from 13.4 Mb to 25.4 Mb. The genetic map comprises more than 460 loci. The ratio of physical to genetic distance between markers on average is 200 kb per cM. However, from the physical map construction of chromosome 4, recombination hot spots (30-50 kb/cM) and low spots (>550 kb/cM) were found (9). Many tools are available to map a mutation, for example, recessive visible markers, codominant embryo-lethal markers, dominant selectable markers on the located T-DNA and Activation/Dissociation insertions, restriction fragment length polymorphism (RFLP)-derived or PCR-based molecular markers, such as microsatellites, insertion/deletions (INDEL), and single nucleotide polymorphisms (SNP). Several molecular marker maps have been constructed based on RFLP, rapid amplified polymorphic DNA, or amplified fragment length polymorphism (AFLP) as well as on different mapping populations. A combined map, made by statistical integration, gives an approximate position and order for the markers. Recombinant inbred (RI) lines, derived from a cross between Columbia and Landsberg erecta (10), have been used to locate 1265 molecular markers to date. The physical map consists of contigs of DNA clones that are correlated with the mapped markers. Currently, YAC, bacterial artificial chromosome (BAC), and phage P1 artificial chromosome (PAC) contig maps are available that cover the entire genome.

5. Scientific Advances and Applications

The molecular-genetic approach in Arabidopsis research together with cell biology tools, such as confocal microscopy, laser cell ablation, and dye-loading have led to major breakthroughs in plant developmental biology (11-13). Tremendous progress has been made in understanding the molecular control of meristem identity in vegetative meristems, during flower initiation and flower organ formation, embryo development, and pattern formation during embryogenesis, root development, epidermal cell fate specification in root hair, and trichome formation. Genes have been identified that are involved in hormone perception, biosynthesis, and signal transduction. The first hormone receptor for plants has been characterized in Arabidopsis (14). Much of the molecular insights into light perception and signal transduction, cell cycle regulation, and in disease resistance in higher plants comes from studies in Arabidopsis (15-17). Biochemical items, such as cell wall biology, are being explored. Within the next 10 years, the function of every gene will be determined. This will be achieved by reverse genetics techniques, such as knockouts and targeting-induced local lesions in genomes (TILLING) (18), to create allelic diversity and by implementation of genome-wide methods, such as transcript profiling and microarrays. In general, biological processes will be analyzed by genomics and bioinformatics tools in order to identify all the molecular components involved.

The Arabidopsis genes and mutants are resources that are exploited to isolate orthologs from other species and to test their functional conservation (19), or that are used for the genetic modification of even distantly related crop plants (20). The molecular markers within contigs in Arabidopsis have been used for comparative mapping with Brassica spp. Colinearity in 5- to 10-cM regions has been demonstrated between the Arabidopsis genome and that of Brassica nigra (21). This implies that information and markers obtained from the physical mapping in Arabidopsis can be applied to syntenic genomic regions in mustard crops to analyze important traits in breeding programs. Microarrays of Arabidopsis have proven to be useful for the analysis of transcription profiles of developing seeds in related species, such as Brassica (22), and will be a great help in the study of seed biology in engineered plants. It is to be predicted that Arabidopsis will be used as a reference system to accelerate knowledge accumulation in crop species. DNA sequence analysis showed the presence of genes in the Arabidopsis genome that suggest the existence of secondary metabolism pathway in this plant (7). Hence, Arabidopsis might be a helpful tool to unravel these pathways in medicinal plants.

Next post:

Previous post: