Analyzing proteomic, genomic and transcriptomic elemental compositions to uncover the intimate evolution of biopolymers (Proteomics)

To grow and reproduce, organisms must acquire the constituents of biopolymers from their environments. A number of the elemental constituents of biopolymers, including phosphorus, carbon, hydrogen, nitrogen, and sulfur, are subjected to biogeochemical cycles, and may vary in their abundance in both space and time. Therefore, organisms must be adapted to survive both persistent conditions of their habitats and substantial perturbations in the availability of nutrients. Several studies have now revealed that adaptive biases in the elemental composition of biopolymers may provide one mechanism for dealing with these perturbations. They also demonstrate that nutritional constraints may shape the evolution of biopolymers at a more intimate scale than amino acid (see Article 96, Fundamentals of protein structure and function, Volume 6) or base composition (see Article 47, The mouse genome sequence, Volume 3, Article 7, Genetic signatures of natural selection, Volume 1). Availability of more and more data on genomes (see Article 1, Eukaryotic genomics, Volume 3, Article 2, Genome sequencing of microbial species, Volume 3, Article 65, Environmental shotgun sequencing, Volume 4), proteomes (see Article 34, Large-scale protein annotation, Volume 7, Article 94, Expression and localization of proteins in mammalian cells, Volume 4) and transcriptomes (see Article 79, Technologies for systematic analysis of eukaryotic transcriptomes, Volume 4, Article 81, Using ESTs for genome annotation – predicting the transcriptome, Volume 4) now provides an unprecedented opportunity to study adaptive imprints in the atomic composition of biopolymers, and the factors driving these adaptations.


In their seminal 1989 paper, Mazel and Marliere reported such an imprint in the cyanobacterium Calothrix sp. PCC 701 light-harvesting phycobilisome (Mazel and Marliere, 1989). They determined that Calothrix expresses a phycobilisome specifically depleted in sulfur-containing amino acids when grown in a sulfur-limited medium. Since phycobilisomes can account for up to 60% of the soluble proteins of cyanobacteria, they proposed that this differential expression allowed a significant sparing in the quantity of sulfur atoms required for protein synthesis. Existence in Calothrix of an operon encoding a phycobilisome specifically depleted in sulfur atoms was seen as evidence that nutrient availability could have influenced biopolymer evolution (Mazel and Marliere, 1989).

More recently, a comparable mechanism was reported in the yeast Saccha-romyces cerevisiae (Fauchon et al., 2002). When S. cerevisiae is exposed to cadmium, the synthesis of glutathione, a tripeptidic thiol critical for cadmium detoxification, is strongly induced (Vido et al., 2001). Fauchon et al. revealed that this is accompanied by a deep transcriptional and translational reorganization, leading to the transient expression of a set of proteins containing 30% fewer sulfur atoms than the set expressed in the absence of cadmium. They proposed that this allows significantly more sulfur to be dedicated to glutathione synthesis at a time when it is critical for the cell (Fauchon et al., 2002).

In the two above-mentioned examples, biases in atomic composition of subsets of proteins were apparently selected to achieve a significant sparing in an element, at the scale of a whole cell. Other mechanisms could select for biases in the elemental composition of subsets of proteins. For example, Baudouin-Cornu et al. (2001) showed that in the yeast S. cerevisiae, and the bacterium Escherichia coli, proteins used for assimilating sulfur and carbon tend to contain fewer sulfur and carbon atoms, respectively, relative to the rest of the proteomes of these organisms (Baudouin-Cornu et al., 2001). The reduced use of an element in a small number of moderately expressed proteins likely has little impact on the overall use of this element by a cell. However, when an element is scarce in the growth medium, amino acids containing this element may be less abundant in the cell, as shown for the two sulfur-containing amino acids in S. cerevisiae (Lafaye et al., 2005). This would result in a decrease in the corresponding aminoacyl-tRNAs, as observed in the case of amino acid starvations (Dittmar et al., 2005), and may reduce the rate of translation of proteins, according to their content of this element. Baudouin-Cornu et al. proposed that the biases they reported were selected during transient episodes of scarcity in sulfur or carbon, and allow S. cerevisiae and E. coli to maintain functional sulfur or carbon assimilatory pathways when their environment is scarce in sulfur or carbon (Baudouin-Cornu et al., 2001). Taken together, these studies (Mazel and Marliere, 1989; Baudouin-Cornu et al., 2001; Fauchon et al., 2002) support the conclusion that systematic biases in the atomic composition of subsets of proteins can result from adaptation to transient changes in the environment.

However, for organisms growing under the normal conditions of their habitat (i.e., in the absence of a severe, transient disruption in nutrient supply), the quantities of different elements required for synthesizing biopolymers are likely influenced by atomic composition of whole genomes, proteomes or total RNA (hereafter, “RNAome”), rather than specific subsets of proteins. Therefore, at this scale, biopolymers may be expected to reflect adaptation to persistent, rather than transient, environmental features. A small number of studies have now attempted to evaluate the extent of variation among organisms in the elemental composition of biopolymers at the scale of whole genomes and proteomes, and to identify factors (e.g., traits of species) that are associated with this variation.

Schematic representation of the relationships between carbon and nitrogen contents of the three main biopolymer classes. The carbon (C) and nitrogen (N) compositions of adenine (A), thymine (T), uracil (U), guanine (G) and cytosine (C) are indicated in the lower left-hand corner. Statistics on the correlations between genomic GC contents and proteomic nitrogen and carbon contents can be found in Baudouin-Cornu et al., 2004 and Bragg and Hyder, 2004. See text for more details

Figure 1 Schematic representation of the relationships between carbon and nitrogen contents of the three main biopolymer classes. The carbon (C) and nitrogen (N) compositions of adenine (A), thymine (T), uracil (U), guanine (G) and cytosine (C) are indicated in the lower left-hand corner. Statistics on the correlations between genomic GC contents and proteomic nitrogen and carbon contents can be found in Baudouin-Cornu et al., 2004 and Bragg and Hyder, 2004. See text for more details

Analysis of the elemental composition of genomes (double stranded DNA) is straightforward, since genomic carbon and nitrogen content are each related directly to guanine and cytosine (GC) content. Each GC pair has 8 nitrogen atoms and 9 carbon atoms, while each adenine and thymine (AT) pair has 7 nitrogen atoms and 10 carbon atoms. Therefore, a GC-rich genome contains more nitrogen and less carbon per base pair than an AT-rich genome (see Figure 1). This observation, taken together with wide variation in GC content among bacteria (ca. 25% to >70%), prompted McEwan et al. to test the association between bacterial genomic GC content and the ability to fix atmospheric nitrogen (McEwan et al., 1998). They found that within aerobic genera, nitrogen-fixing bacteria had higher genomic GC content, and therefore higher genomic nitrogen content per base pair, than nonfixing species (McEwan et al., 1998). They suggested that this may represent an adaptive association between nitrogen expenditure in genomes (and potentially in RNAomes, see below), and nitrogen fixing. However, different types of biopolymers are represented in different abundances and may therefore have different consequences for nutrient use. According to Neidhardt and Umbarger’s compilation, 3% of the dry weight of a bacterial cell is accounted for by DNA, 55% by proteins, and 20% by RNA (Neidhardt and Umbarger, 1996). Therefore, in terms of nutritional constraints, the elemental composition of genomes may be less important than the elemental composition of proteomes or RNAomes.

Predictions of proteomes from genome sequences, have allowed interesting insights into proteome elemental compositions of unicellular organisms, especially prokaryotes (see Article 13, Prokaryotic gene identification in silico, Volume 7). For example, Bragg and Hyder found that proteome nitrogen and carbon content were correlated positively and negatively (respectively) with genome GC content. This means that the carbon and nitrogen contents of genomes are correlated positively with the carbon and nitrogen contents of proteomes, respectively (Bragg and Hyder, 2004). These correlations are due to the structure of the genetic code (Baudouin-Cornu et al., 2004; Bragg and Hyder, 2004). Additionally, predicted whole proteomes allow more subtle studies of proteomic elemental composition than just the comparison of means. Using the quantile representation proposed by Karlin et al. to compare amino acid compositions (Karlin et al., 1992), Baudouin-Cornu et al. found the same negative correlation between genome GC contents and proteome carbon contents, but also showed that the quantile distributions of proteome carbon contents were stochastically ordered, whereas quantile distributions of proteome nitrogen contents were not (Baudouin-Cornu et al., 2004). This suggests that mean values of proteomic carbon content likely provide a reliable indication of relative proteomic carbon use among species, whereas for nitrogen, mean proteomic values are less likely to provide a reliable indication of relative proteomic nitrogen use. That is, the expression levels of proteins with different nitrogen content (within proteomes) may have a relatively larger role in determining average protein nitrogen use among organisms (e.g., in comparison to carbon). This highlights the potential usefulness of considering protein expression levels explicitly in studies of proteomic elemental composition, as expression data become increasingly available.

Although both studies of proteomic carbon and nitrogen composition led to interesting observations (Baudouin-Cornu et al., 2004; Bragg and Hyder, 2004), they did not identify links between the elemental composition of proteomes and the environments in which different organisms live. A recent study of proteome sulfur contents has revealed such relationships (Bragg et al., in press). In particular, we observed a tendency for species living at high temperature to have lower proteomic sulfur use. To our knowledge, this is the first example of a simple relationship between an environmental feature and the quantity of a specific element used in proteins, at the level of whole proteomes. However, as genome sequences and lifestyle data become available for a growing number of microorganisms, we anticipate that more relationships will be revealed between environmental factors and the atomic composition of proteomes or RNAomes.

Variation in the nitrogen and carbon content of RNAomes among organisms has not (to our knowledge) been considered explicitly, despite the observation that RNAs may account for 20% of cellular dry mass (Neidhardt and Umbarger, 1996). The bases of RNA, A, C, G, and U (uracil), contain 5, 3, 5, and 2 nitrogen atoms, respectively, while A and G each contain 5 carbon atoms, and C and U each contain 4 carbon atoms. The nitrogen and carbon content of RNAomes are not related exactly to GC content (as in genomes), because (1) RNAs in cells are typically single-stranded, and (2) different RNA molecules may vary greatly in their abundance. However, if the parity G = C and A = U holds approximately (Chargaffs second parity rule, see Forsdyke and Mortimer (2000) for a review), and GC-rich genomes encode GC-rich RNAomes, it is likely that organisms with high genomic nitrogen content (high GC content), would have nitrogen rich RNAomes (McEwan et al., 1998). This suggests that there is a correlation among organisms from relatively low nitrogen content in the three main biopolymers (and low GC content), to relatively high nitrogen content (and high GC content) in these biopolymers (Figure 1). This prediction hints that comparing RNAome elemental composition among species would reveal interesting features and deserves to be undertaken. In particular, ribosomal RNAs (rRNAs) constitute up to 80% of the total RNAs in a cell, and may therefore have an inordinate influence in the elemental composition of total RNA. An analysis of the elemental composition of rRNA sequences among different organisms may thus provide useful insights into variation in nitrogen expenditure in RNA.

Future studies of the atomic composition of biopolymers hold much promise for our understanding of both biopolymer evolution, and the nutrient requirements of organisms. Such studies should continue to examine variation in atomic composition both among subsets of biopolymers within organisms, and among organisms. For example, they might include analyses aimed at testing whether multicellular organisms have biases in the atomic composition of biopolymers that are expressed during specific stages of development, or in specific types of cells. Among organisms, it may be enlightening to study the atomic composition of biopolymers by testing hypotheses concerning traits that influence the access of different organisms to specific nutrients.

Next post:

Previous post: