Agriculture Reference
In-Depth Information
regions and that these regions can be selected
effectively. We can use our current knowledge of
the organization of the wheat genome, obtained
through a combination of physical (deletion)
mapping and BAC sequencing, to estimate the
size of the gene space in wheat. To do this, we
fi rst need to know what the gene number is in
wheat. Paux et al. (2006) analyzed about 11 Mb
of BAC-end sequences derived from chromosome
3B for the presence of genes. Considering that
about 1.2% of the BAC-end sequences consisted
of coding regions, and assuming an average gene
size (coding region only) of 2 kb, the 1-Gb-large
3B chromosome was estimated to contain 6,000
genes (Paux et al., 2006). Extrapolating to the
entire B genome gave an estimate of 36,000 genes.
Extrapolation from 800 kb of WGS sequence, on
the other hand, led to an estimate of approxi-
mately 98,000 genes per genome (Rabinowicz
et al., 2005). The large gene number is thought to
be due to a recent amplifi cation of pseudogenes
in the hexaploid wheat genome. Nevertheless,
since neither study discriminated between func-
tional genes and pseudogenes, this does not
explain the large discrepancy in the number of
genes estimated from BAC-end and WGS
sequences. This difference is inherent to the data
sets and is not due to the annotation methodology
used.
Reannotation of the Rabinowicz data set by
Paux and colleagues led to a similarly high gene
number (107,000 per wheat genome). A prelimi-
nary annotation of genes in 6.7 Mb of sequence
obtained from randomly selected BAC clones
suggested the presence of 190,000 genes in the
hexaploid wheat genome, or some 63,000 per
genome (X. Xu, K.M. Devos, P. San Miguel,
J.L. Bennetzen, unpublished data), which is
close to the average of the Paux et al. (2006) and
Rabinowicz et al. (2005) studies.
Assuming that the gene number in wheat
(including both functional genes and pseudo-
genes) is 195,000 and that the average size of
a wheat coding region is 2 kb, the wheat gene
space would occupy approximately 390 Mb. Of
course, if the gene space is isolated from BAC
clones, then we also have to take intergenic dis-
tances into account. From the Lr10 and VRN-2
studies, we have learned that average gene densi-
ties, considering not only gene islands but also
interisland distances, are in the range of 1 gene
per 40-55 kb in the distal chromosome regions.
Assuming that this gene density is maintained
along the telomere-centromere axis—although
we know that this assumption is incorrect—we
would have to sequence 8,800 Mb or about 50%
of the wheat genome to obtain a minimum of
95% of the genes. Preliminary annotation of 66
randomly selected BAC clones has indicated that
less than 50% of the genes are organized in gene
islands and that the remainder is present in the
genome as singletons (X. Xu, K.M. Devos, P.
San Miguel, and J.L. Bennetzen, unpublished
data). This study also suggests that obtaining
94% of the wheat genes by selecting gene-
containing BAC clones would require sequenc-
ing around 50% of the hexaploid wheat genome.
The cost of this approach would be approxi-
mately $65 million.
Sequencing the gene space using
gene-enrichment methodologies
The rationale behind gene-enriched sequencing
strategies is that they provide a large proportion
of the genes without having to sequence the
repeats. Since the gene space is expected to be of
similar size in large and small genomes, sequenc-
ing the gene-space is a cost-effective way of
obtaining the genes, particularly in large genomes.
The traditional way of obtaining gene sequences
is by end-sequencing cDNA libraries to produce
ESTs. In wheat, approximately 1 million ESTs
are currently available in GenBank (Ogihara
et al., 2003; Zhang et al., 2004a; Houde et al.,
2006; Mochida et al., 2006). The only plant
species for which higher numbers of ESTs are
available are rice (1.2 million), maize (1.4 million),
and Arabidopsis (1.5 million). In Arabidopsis, it
has been estimated that the ESTs represent only
around 60% of the annotated genes (The Arabi-
dopsis Genome Initiative 2000). Similar, or
lower, fi gures are available for other species, sug-
gesting that EST sequencing alone does not
capture all genes present in a genome (Barbazuk
Search WWH ::




Custom Search