Primary Structure (Molecular Biology)

Protein structure is classified in a hierarchical manner into primary structure, secondary structure, tertiary structure, and quaternary structure. The primary structure of a protein is the sequence, or order, of amino acid residues in the polypeptide chain and can be represented in a variety of ways (two examples are given in Fig. 1). By convention, the sequence is numbered from the N-terminus to the C-terminus of the polypeptide chain. The primary structure also includes information about the number of polypeptide chains in a protein and any covalent modifications, such as disulfide bond formation, phosphorylation, sulfation, or glycosylation (see N-Glycosylation and O-Glycosylation). In addition, nonprotein groups, or prosthetic groups, such as heme, metal ions, or pigments, form part of the primary structure of a protein. The significance of a protein’s primary structure is that it determines both the three-dimensional structure and the function of that protein.

Figure 1. Alternative representations of the primary structure of the 16 residue peptide a-conotoxin EpI (1). The upper representation gives the amino acid sequence using the three-letter code, the lower representation uses the one-letter code for different amino acid residues. Both representations show the presence of two disulfide bonds, sulfation of the tyrosine residue, and amidation of the C-terminus (see Post-Translational Modifications).

Alternative representations of the primary structure of the 16 residue peptide a-conotoxin EpI (1). The upper representation gives the amino acid sequence using the three-letter code, the lower representation uses the one-letter code for different amino acid residues. Both representations show the presence of two disulfide bonds, sulfation of the tyrosine residue, and amidation of the C-terminus (see Post-Translational Modifications).


Determination of a protein’s primary structure can be done either directly using a combination of biochemical and chemical techniques (see Protein Sequencing) or, more often, indirectly by identifying the nucleotide sequence of the corresponding gene or complementary DNA. If two different proteins have similar primary structures, they are said to be homologous and are likely to have similar tertiary structures and functions. Sequence databases such as SWISS-PROT (1) hold information about the primary structures of many thousands of proteins and can be searched to identify homologous proteins.

In vivo, the primary structure of each protein is genetically encoded. Mutations in the genetic code, or errors in reading it, can lead to changes in the protein primary structure. Such variations from the normal or wild-type sequence can result in changes to both the structure and function of a protein and can affect the viability of the cell and/or the organism. On the other hand, changes may be engineered into the primary structure of a protein in vitro by site-directed mutagenesis to investigate the role of specific residues.

Theoretically, it is possible to predict the tertiary structure of a protein directly from knowledge of its primary structure (see Protein Structure Prediction). In practice, however, this so-called protein folding problem is not yet solved, except in those instances where protein structures can be predicted by homology modeling .

Next post:

Previous post: