Reverse Translation (Molecular Biology)

Translation is the biosynthesis of a protein from a messenger RNA template on ribosomes . Reverse translation is not a biological process. Instead, it is inferring DNA sequence from the amino acid sequence of a protein. Reverse translation is often employed to make a hybridization probe or a PCR primer used to clone the gene encoding the protein of interest (see Cloning) (1-3).

In most cases, the amino acid sequence of a protein is directly inferred from the DNA or mRNA sequence coding for the protein because each nucleic acid triplet (codon) specifies either a single amino acid or a termination signal (see Genetic Code). The converse, determining the DNA or mRNA sequence coding for a specific amino acid, is more complex because the genetic code is "degenerate" (see Degeneracy of the Genetic Code). In the nuclear DNA of eukaryotes, 61 codons specify 20 amino acids, so many amino acids are coded by more than one codon. This means that reverse translation of a protein does not produce a single nucleotide sequence. Instead, it results in a population of different sequences that, if translated, would all code for the same amino acid sequence. To identify the actual genomic sequence that codes for the protein in vivo, it is necessary to clone and sequence the gene for the protein. The first step in cloning is to synthesize a mixture of oligonucleotides (oligos) that corresponds to all of the potential coding sequences determined by reverse translation. This pool of oligos is used as a "degenerate" (mixed) hybridization probe to isolate the corresponding DNA or cDNA clone from a library. Alternatively, reverse translation is used to design two sets of "degenerate" PCR primers to amplify the gene from genomic DNA.


When using reverse translation to design degenerate oligos for gene cloning, several factors must be taken into account. A 14-base oligo is sufficiently long to identify the gene of interest specifically, but the five-residue stretch of protein that is reverse translated to produce this oligo must be chosen carefully. The more protein sequence that is known, the easier it is to find an appropriate amino acid stretch to reverse translate. Because serine, leucine, and arginine are each coded by six different codons, these residues should be avoided. Protein sequences containing tryptophan and methionine residues are preferred, because they are each coded by only one triplet codon. Fewer different oligo sequences are needed to cover all reverse translation possibilities if less "degenerate" amino acids are chosen, and a less complex set of oligos makes a more efficient probe or PCR primer. Different organisms preferentially use particular codons to specify amino acids, and this codon usage bias should also be taken into account when designing oligos by reverse translation. Additionally, computer programs are available to help design synthetic genes and degenerate probes and primers by using reverse translation (4).

Once many of the large-scale genome sequencing projects are complete (eg, the human, Caenorhabditis elegans, and Arabidopsis sequencing projects), and with the ever increasing number of expressed sequence tags (ESTs) available, reverse translation most frequently will be used to isolate genes from organisms where little sequence data is available. For well-studied organisms, database searches (where a computer program compares known protein sequences and translated nucleotide sequences to look for similarities) will replace the need to reverse translate and clone to determine the nucleotide sequence.

Next post:

Previous post: