Biology Reference
In-Depth Information
They expressed these proteins from a standard expression vector in E. coli for a 250-fold
variation in protein levels as measured by GFP intensity with a microplate reader. The
stability of mRNA folding near the start codon explained more than half the variation in
protein levels. Secondary structures can inhibit rates of translation initiation and were the
dominant factors in influencing gene expression levels. Codon variation can influence
secondary structures by varying mRNA-free energies. The authors concluded that the key
factor is avoidance of mRNA secondary structures near the ribosomal binding site, and
that codon bias is important in fewer cases than previously thought.
Welch et al. 80 presented a novel systematic analysis of the relationship between sequence
and expression. They first demonstrated the utility of synonymous codon variation by
designing and synthesizing 40 codon variants of genes encoding two structurally and
functionally different proteins: a DNA polymerase and a scFv; and realized a 40-fold
variation in expression from bacterial lysates in polyacrylamide gels. The variants were
designed using a Monte Carlo repeated random sampling algorithm. Experimental
expression data poorly correlated with the Codon Adaptation Index (CAI), a previously
widespread gene design principle that measures and mimics usage of preferred codons in a
particular host genome. Welch et al. made the claim that favorable codons are not codons
that are most abundant in highly expressed E. coli proteins. This was significant because
CAI and related rules have been used to predict gene expression levels in the past by various
gene synthesis vendors. Furthermore, they examined whether sequence characteristics
affecting expression were local (mRNA structures, codon clusters) or global (codon usage,
GC content). They divided the best and worst expressers into thirds and made chimeras.
Some chimeras showed highly distributed effects, while some showed strong dependence on
the parental origin of particular segments. Their findings showed that expression levels were
not region-specific. Variation among the chimeras is largely explained by their predictive
partial least squares model used to generate gene variants. Finally, when their model was
cross-referenced with various design parameters, results suggested a biochemical basis of
preferred codons in the sensitivity of amino-acylated tRNA during starvation conditions,
based on predictive modeling work by Elf et al. 85
15
The authors avoided overselling their claims by not offering a simple solution for codon
optimization, but focusing on their systematic analysis as a design tool. A larger sample size
for gene variants is still needed to make more specific conclusions about codon optimization.
Advances in throughputs of de novo DNA synthesis have allowed combinatorial library
screens to become more pervasive. 86 88 With low-cost DNA synthesis, a large library of gene
variants can be synthesized and analyzed with high-throughput expression screening
platforms. Tian
s group 19 applied this strategy by using their on-chip gene synthesis
technology to synthesize oligonucleotide libraries, and assembled them into libraries of
codon variants for LacZ
'
and 74 Drosophila transcription factors. In one round of synthesis
and screening, clones were selected with gene expression levels which varied from 0% to
almost 60% of the total cell protein mass.
α
Genome Synthesis
While assembly of regular-sized genes in less than two weeks is now routine, assembly of
longer sequences is still costly and unpredictable. On the genome scale, costs become
prohibitive. A 10 6 bp genome would cost on the order of USD 100 000 for oligonucleotides
alone, an unreasonable cost for an average laboratory. If microarrays are utilized, however,
oligo costs could potentially be reduced to
USD100, 21 and it would take less than a
single chip to make all of the oligonucleotides. However, high gene synthesis costs have not
deterred researchers from de novo synthesis of longer sequences. Synthesis length records
have historically increased at a logarithmic rate. In 2002, the 7.5 kb poliovirus genome was
,
Search WWH ::




Custom Search