Protein arrays (Proteomics)

1. Introduction

High-density DNA microarray technology has played a key role in the analysis of whole genomes and their gene expression patterns. The ability to study many thousands of individual genes, using oligonucleotide or cDNA arrays, is now very widespread, with its uses ranging from the profiling of gene expression patterns in whole organisms or tissues to the comparison of healthy and pathological samples. This technology together with the sequencing of the human genome has produced vast amounts of gene expression profiling data and, importantly, new bioinformatics tools to handle these data. Such data reveal important information regarding gene expression, however, the function of most genes lies at the protein level. Whether studying, for example, development in a particular organism or disease processes, knowledge of alternations in protein levels, protein structures (including modifications), protein-protein interactions, and so on, is crucial to elucidate with such complex biological phenomena. However, such knowledge is more difficult to attain, even if only one protein is being studied, due to the enormous complexity of the protein world. Alternative splicing, proteolytic events, and posttranslational modifications (e.g., glycosylation, acetylation, phosphorylation) are just some of the events that can occur to gene products, resulting in many more proteins than the genes coding for them. Further, proteins can form complexes with other proteins, cofactors, DNA/RNA, and so on.


Traditional methods for the analysis of proteomes include two-dimensional gel electrophoresis or chromatography, which, when combined with mass spectrometry, enables large-scale separation and identification of proteins, including many of their modifications (Melton, 2004). To complement the functional analysis of proteins on a large scale, proteins can be studied in array formats. Theoretically, such an array would contain functionally active proteins, in all their modified states, immobilized on a surface at high density or in solution in nanowells. However, protein activity is dependent on a wide range of factors (posttranslational modifications, cellular localization, pH, presence/absence of cofactors, etc.), which makes the production of a protein chip containing the whole proteome a daunting task. Current protein and antibody arrays represent the first steps toward this goal.

2. Generation of content for protein arrays

The first requirement toward the generation of protein arrays is a source of large numbers of recombinant proteins. A number of strategies are currently employed to provide sources of thousands of proteins for the generation of protein arrays. One approach is high-throughput cloning or amplification using PCR of defined open reading frames coding for the proteins of interest (Kersten etal., 2003; Reboul etal., 2003). The successful implementation of such an approach relies on the availability of sequence data and its correct annotation in the databases. In particular, the definition of the open reading frames of alternative splice variants of one protein remains difficult with such an approach. Also, previously uncharacterized proteins will be absent, limiting this approach as a discovery tool. For these reasons, this approach has proved most valuable in the production of chips containing proteins from well-characterized organisms, such as Saccharomyces cerevisiae and Caenorhabditis elegans (Schweitzer et al., 2003).

Another approach is the use of protein expression libraries, which are used for a “shotgun” approach to the generation of recombinant proteins (Bussow etal., 1998). Such libraries are generated using mRNA isolated directly from the cell (Lueking etal., 2000; Bussow etal., 2000). In this approach, mRNA is isolated from the tissue or organism of interest and directionally subcloned into a protein expression vector suitable for heterologous expression in either a bacterial (e.g., Escherichia coli) or eukaryotic organism (e.g., yeast) – some solutions exist for the rapid movement of coding region from one organism to another, such as the GATEWAY system (Life Technologies) (Walhout etal., 2000) or dual expression vectors (Lueking et al., 2000). Not only does this approach circumvent the cloning of individual open reading frames, readily permitting the expression of tens of thousands of proteins, but also this strategy automatically includes splice variants and previously uncharacterized gene products, that is, there is no selection based on presently available databases and annotations.

Both strategies generally use a system by which the expression of the recombi-nant proteins is controlled by an inducible promoter. Thus, the time and duration of protein expression can be tightly regulated. One common and well-characterized system for expression in bacteria is the IPTG-inducible LacZ promoter. Because recombinant proteins are being expressed in a foreign environment, for example, human proteins expressed in E. coli, measures can be taken to improve expression levels. For example, plasmids coding for tRNAs, which are responsible for the translation of codons rarely found in bacteria but more commonly in mammals, can be introduced into the system, for example, the argU gene on the pSE111 vector that codes for a rare arginine tRNA (Bussow et al., 1998; Brinkmann et al., 1989). Other issues can also be addressed such as the expression of proteins that are toxic to the host organism. This remains a challenge; however, a number of systems have been developed to minimize basal level expression of the toxic recombinant protein in the host before induction. One such system for expression in E. coli is the pLysS vector (Stratagene), which is a low-copy-number plasmid that carries an expression cassette from which the T7 lysozyme gene is expressed at low levels. This T7 lysozyme binds to T7 RNA polymerase and inhibits transcription by this enzyme. This approach has been used successfully for high-throughput expression of mammalian proteins in bacteria (Ding et al., 2002).

Apart from just the question of expressing the recombinant proteins of choice in a particular host system, there is also the question of tagging these proteins. A large number of tags exist, the coding region of which can readily be introduced into vectors. Depending on the approach taken, many options are available, including N-terminal and/or C-terminal tagging, multiple different tags are also an option, in particular, in conjunction with proteolytic sites for subsequent removal of the tag, if required. These tags can vary in size from the short His6 tag to the 30-kDa GFP tag. Such tags are required for the detection of the expressed recombinant proteins and, in conjunction with an appropriate affinity separation method, for their purification (more steps of purification can be introduced using multiple tags). For a review of different expression systems, see Braun and LaBaer (2003).

In one example, a human fetal brain cDNA library was subcloned into a bacterial expression vector, permitting controlled IPTG-inducible expression of His6-tagged recombinant proteins. The E. coli clones of this library were then arrayed in high density onto PVDF membranes, grown overnight, and recombinant protein expression induced for a controlled length of time. Those clones expressing a human protein are readily detected by means of an anti-His6 antibody. In this fashion, a protein array containing 10-12 000 different human proteins has been generated (Bussow et al., 1998). Approximately 66% of the proteins expressed in this library are, according to present annotation, full length. Also, such proteins have been shown to be readily expressed and purified in high throughput using available automated platforms (Braun et al., 2002; Lueking et al., 2003). It is such approaches that today allow the generation of protein microarrays.

3. Protein microarrays: toward the proteome on a chip

Once a source of proteins has been established, consideration must be given to the format that a protein microarray can have. For example, there are presently a number of possible surfaces available for the manufacture of such chips. In general, the available surfaces can be divided into two major categories: flat (planar) and 3D (mostly gel-like) surfaces (Angenendt et al., 2002; Angenendt et al., 2003). Planar surfaces have been primarily developed in the area of cDNA arrays. In general, the glass surface of these chips are treated to produce a thin layer of a particular chemical group, for example, an aldehyde group or poly-L-Lysine (Haab et al., 2001). The proteins are bound to the surface of these chips by either covalent bonds or simple electrical charge. Another type of planar chip surface available is plastic polymer-coated slides, such as the MaxiSorb slides from Nunc – an approach used in ELISAs for many years. The simple adoption of technology from the area of cDNA arrays brings with it a number of major concerns when used with proteins. First, unlike DNA, the surface charge of proteins is highly variable and the use of simple uniform electrostatic interaction for the immobilization of different proteins results in large variation in the amount of protein bound. Second, the structural conformation of proteins deposited onto a planar surface cannot be expected to very closely mimic “native” proteins, but would be more similar to proteins present on membranes, such as in traditional Western blotting or dot blot experiments. Third, there is no control over the orientation of the proteins on the surface, which may result in, for example, the inaccessibility of an active site.

The development of the gel-like 3D chip surface was partly driven in an attempt to minimize the denaturation of the immobilized proteins in the arrays. A number of such surfaces exist based around polyacrylamide and agarose coated on a glass surface, which provide a hydrophilic environment for the proteins. Such surfaces allow the user to adjust various conditions, such as pH and salt concentrations, by incubating the chips in the appropriate buffer. Using a “homemade” polyacrylamide surface, a glass chip containing 2413 nonredundant purified human fusion proteins, arrayed at a density of up to 1600 proteins cm-2, has been successfully employed in antibody binding studies, including the screening of human sera samples (Lueking et al., 2003), indicating the maintenance of a degree of structural conformation of the proteins involved. There are also non-gel-like 3D surfaces available, such as the FAST slides from Schleicher and Schuell, which have a nitrocellulose surface. Like any 3D surface, this surface will allow a much higher concentration of proteins per spot. One advantage of the nongel 3D surface is the increased shelf life of these slides.

However, these surface solutions still leave the problem of controlling the orientation of the proteins on the surface of the slides, which is necessary to maximize the activity of these proteins. For example, in order to maximize the binding of antibodies arranged on a surface to their epitopes, it would clearly be an advantage if the heavy chain were attached to, or close to, the surface and the antigen-binding site as far away from the surface as possible. One approach has been the use of affinity tags, for example, a nickel-coated slide has a natural and specific affinity to His6-tagged recombinant proteins. This approach was used to array 5800 yeast proteins that were screened for their ability to interact with calmodulins and phospholipids (Zhu etal., 2001). Similarly, successful orientation of antibodies and antibody Fab fragments was achieved by biotinylating such antibodies/Fab fragments and arraying them on a streptavidin-coated surface (Peluso et al., 2003). A further development of surface chemistry involves the use of a polyethylene glycol layer (PEG) (Angenendt et al., 2003) or dendrimers (Benters et al., 2002; Ben-ters et al., 2001). These approaches involve the coupling of the proteins to epoxy groups, which act as spacers preventing direct protein-surface contact and, thus, eliminate the need for blocking reagents to reduce background binding. One further development of this approach has been to link chelating iminodiacetic acid groups to PEG, which in turn can be bound by Cu2+ ions and so provide a highly specific binding site for His6-tagged proteins (Cha et al., 2004). The authors demonstrated one additional, and potentially very important, use for such technology and that is the elimination of the need for prepurification of tagged recombinant proteins for arrays. A number of studies have been carried out comparing the different surfaces available for protein array work, including antibody arrays that assess background noise, sensitivity/detection limits, reproducibility, and storage for a variety of experimental designs (Angenendt etal., 2002; Angenendt etal., 2003).

4. Microfluidic chips

While the development of 3D surfaces and spacers goes some way to address the problems faced when looking at protein-protein interactions on a chip, many experiments that involve interactions of proteins in a functional state will prove difficult to perform on these chips, which is obviously a major drawback of the current protein arrays. Ideally, we would like to look at protein interactions, where the proteins are in their native state and are functional, that is, in conditions as close as possible to those in nature. This may be solved by developing a microfluidic chip, which is a series of enclosed microchannels within a chip format, such as silicon, plastic, or glass. The potential of microfluidic chips would include the ability to maintain proteins in their functional conformations and to therefore perform interactions such as protein-protein, protein-peptide, protein-compound, protein-DNA, protein-ligand, and protein-antibody interactions in solution. Also, the area of enzyme studies would greatly profit from such a system. In fact, the first steps have been taken to use a “lab-on-a-chip” to study some of the reactions in the glycolytic pathway of yeast (Young et al., 2003). Using this system, enzymatic reactions in volumes as low as 6.3-8 nL could be studied (Dietrich et al., 2004). The ability to use such small volumes in microfluidic chips is very important due to the high cost of proteins, antibodies, compound libraries, peptides, and even the difficulties in obtaining enough patient samples to screen in this format, when screening in high throughput, can be prohibitive. Such technology would also allow us to fully exploit the libraries of proteins that currently exist.

Recent developments show that microchannels can be generated in plastic, where the microchannels have a diameter of 100 |m x 100 |m, can be centimeters in length, and can be fabricated using glass, silicon, and plastic (Guber et al., 2004). However, there are current technological challenges, which need to be addressed, such as loading, sample positioning and manipulation of samples in these chips, handling nanoliter volumes, and scanning the chip.

5. Applications of protein arrays

Theoretically, present protein arrays provide a tool to examine interactions with proteins, whether they be with other proteins (including antibodies), peptides, DNA/RNA, or chemical compounds on a large scale. In one proof-of-principle experiment, a small number of well-defined protein-protein interactions, including an interaction dependent on a small molecule, were demonstrated in a microarray format (MacBeath and Schreiber, 2000). While still somewhat in its infancy, a number of studies have been completed in which high-throughput screening of protein microarrays has proven successful. As mentioned above, calmodulin-and phospholipids-interacting proteins have been identified by screening almost 6000 yeast proteins, generated by expressing previously annotated open reading frames (Zhu et al., 2001). A human protein expression library, involving in situ expression of tens of thousands of recombinant proteins on large membranes (Bussow et al., 1998; Lueking et al., 1999), has been successfully screened to demonstrate antibody-protein interactions. This same library has also provided a source of purified recombinant proteins for microarrays, which were demonstrated to be a successful platform for the large-scale study of protein-antibody interactions (Lueking etal., 2003), an important step toward large-scale studies of antibody specificity. By screening thousands of different proteins from the relevant organism with a particular antibody, it is possible to determine which proteins contain antigens recognized by the antibody in question. In another study (Michaud et al., 2003), 11 polyclonal and monoclonal antibodies were screened against 5000 yeast proteins and the results demonstrated the cross-reactivities of many of the antibodies screened.

One further development of the study of protein-antibody interactions is using high-content protein arrays to determine the target of antibodies previously identified as potentially interesting markers in disease. Our group is presently working to identify the antigens of antibodies, initially identified by immunohistochemistry as interesting markers in certain types of cancer (Figure 1 shows a pipeline for this work). Each antibody can be screened against an appropriate protein expression library, arrayed on either PVDF membranes or, as purified proteins, arrayed on glass chips. Unlike previous approaches, which could at best identify potential epi-topes of a particular antibody, this approach identifies the actual target protein. In addition, potential cross-reacting proteins can also be identified using this method, which is important information for assessing an antibody in disease-screening procedures. Similarly, protein arrays could provide a useful screening technique to be introduced into any antibody production process. For example, during the production of monoclonal antibodies in mice immunized with a particular antigen, the antibody-producing B cells are fused with myeloma cells to form hybridomas, which can then be cultured for large-scale expression of the antibody. During this process, the supernatant from each hybridoma clone is screened for the ability of the antibody present to bind to the target antigen. By introducing a screen of a large-content protein array at this point, the antibody could be tested not only for its ability to bind the known antigen, but valuable data on specificity/cross-reactivity could also be generated.

Pipeline showing integration of protein array technology into cancer marker discovery and characterization. Antibodies that are potentially useful as cancer markers are produced and tested, for example, using immunohistochem-istry. Protein array technology can then be introduced to confirm/discover the protein target bound by the antibody, and assess the specificity/cross-reactivity

Figure 1 Pipeline showing integration of protein array technology into cancer marker discovery and characterization. Antibodies that are potentially useful as cancer markers are produced and tested, for example, using immunohistochem-istry. Protein array technology can then be introduced to confirm/discover the protein target bound by the antibody, and assess the specificity/cross-reactivity

Just as it is possible to characterize the binding of a single antibody using protein arrays, it is also possible to characterize many antibodies present in a single sample, such as to profile the antibody repertoire in serum or plasma. One immediate application is the use of “allergen arrays” to screen for the presence of particular IgE molecules in a patient sample. The traditional approach involves the use of simple extracts from potential allergens. Such extracts, containing both allergens and nonallergens, are commonly used in skin prick tests to determine the possible source of an allergic reaction in the patient. Using modern arraying technology and recombinant allergens (e.g., pollen and fungus proteins) relatively large arrays have been produced for screening purposes (Hiller et al., 2002; Deinhofer et al., 2004; Jahn-Schmid et al., 2003; Wiltshire et al., 2000). These arrays can also readily include nonprotein allergens such as latex, and so on. Such arrays are readily miniaturized, permitting the screening of very low volumes of a patient’s blood, and are also more accurate in identifying the precise source of the allergic response.

In a similar fashion, protein arrays can be used to profile antibodies present in the blood of patients with autoimmune diseases. Initial autoantigen arrays consisted of almost 200 proteins, peptides, and other biomolecules (including several forms of dsDNA and ssDNA), which were known autoantigens to several well-characterized autoimmune diseases, including rheumatoid arthritis and systemic lupus erythematosus (Robinson et al., 2002). Another approach is to use large arrayed libraries of recombinant proteins as a potential autoantigen array. In a proof-of-principle experiment, a protein array chip containing almost 2500 purified recombinant human proteins was used to profile the autoantibodies present in small volume samples from patients with alopecia and rheumatoid arthritis (Lueking et al., 2003). This approach permitted the identification of previously known autoantigens and also previously uncharacterized protein autoantigens. Initial results from screening a large recombinant mouse protein array also indicate the usefulness of this approach to characterize autoimmune disease in a mouse model system for systemic lupus erythematosus (C. Gutjahr, 2005). Protein arrays can contribute enormously to our understanding of the general mechanisms involved in autoimmunity, as well as provide a platform on which to develop the technology for more complete diagnostics of the various autoimmune diseases.

Presently, the use of protein array screening to identify protein-protein (nonan-tibody) interactions is still very much in its infancy and limited to specialized laboratories. Many proteins are difficult to study in solution (e.g., membrane proteins) or may require the presence of various cofactors. Another approach to screening “difficult” proteins is the use of peptide screening.

Figure 2 shows an outline of the approach taken by our group to identify proteins that interact with the cytoplasmic tail of a membrane protein, in this case a platelet integrin (Larkin et al., 2004). The particular conserved a-integrin cytoplasmic motif, KVGFFKR, had previously been shown to play a critical role in the regulation of activation of the platelet integrin aIIbj33 (Stephens et al., 1998). In order to discover the molecular mechanisms involved in this regulation, it is necessary to discover what proteins are interacting with this integrin, and more specifically with this region of the cytoplasmic tail. In order to overcome the difficulties associated with working with entire membrane proteins, a tagged peptide (biotin-KVGFFKR) was synthesized corresponding to the region of interest. This peptide was then screened against a high-density array of 37 000 E. coli clones expressing recombinant human proteins (Bussow et al., 1998; Lueking et al., 1999), and 19 clones, coding for 13 different proteins, were identified as binding the labeled peptide. Of these 19 clones, a strong binding could be shown between this labeled peptide and purified proteins isolated from three clones. Of these clones, one codes for a protein that could not be shown to be present in platelets, and two code for a putative chloride channel, ICln, shown for the first time to be present in platelets. A number of experiments were carried out, including peptide pulldown assays and coprecipitation experiments, to confirm the interaction between ICln and integrin aIIbj33 (see Figure 2 and Larkin et al., 2004). Such an experiment reveals the enormous potential of a protein array approach, not only in identifying novel protein interactions but also toward teasing apart biological pathways in general.

Example of the identification of protein-protein interactions using protein arrays. In order to identify proteins that potentially interact with the cytoplasmic tail of the platelet integrin anb/^. a labeled peptide of this region was generated and a protein array library with over 37 000 clones was screened. One of the proteins identified, the chloride channel ICln, was further characterized and confirmed as an interaction partner of the integrin in biological systems (Larkin et al., 2004)

Figure 2 Example of the identification of protein-protein interactions using protein arrays. In order to identify proteins that potentially interact with the cytoplasmic tail of the platelet integrin anb/^. a labeled peptide of this region was generated and a protein array library with over 37 000 clones was screened. One of the proteins identified, the chloride channel ICln, was further characterized and confirmed as an interaction partner of the integrin in biological systems (Larkin et al., 2004)

Applications of protein array technology such as target identification and characterization, target validation, diagnostic marker identification and validation, pre-clinical study monitoring, and patient typing seem to be feasible. We have recently reviewed the value of recent combinations of efforts in genomics, proteomics, and biochip technology and their impact on the overall drug development process (Huels et al., 2002). For the first time, tools are available to study disturbances within biological systems, such as disease or drug treatment, on the gene and protein expression levels.

Proteins, as targets, dominate pharmaceutical R&D with ligand-receptor interactions and enzymes as the vast majority, comprising ~45% and 28%, respectively, of the targets (Drews, 2000). Additionally, many therapeutic proteins, especially humanized antibodies, are in clinical development. The ultimate tool for high-throughput and significant screening would be to test new leads or new targets in a highly parallel manner. Some examples for applications in this direction already exist. Recently, an immunosensor array has been developed that enables the simultaneous detection of clinical analytes (Rowe et al., 1999). Here, capture antibodies and analytes were arrayed on microscope slides using flow chambers in a cross-wise fashion. This current format is low-density (6 x 6 pattern) but has high-throughput potential, as it involves automated image analysis and microfluidics; it is already becoming one of the future formats for enzyme activity testing and other assays (Cohen et al., 1999). In another study, small sets of active enzymes were immobilized in a hydrophilic gel matrix. Enzymatic cleavage of the substrate could be detected and inhibitors blocked the reaction (Arenkov et al., 2000). More recently, an enzyme array that is suitable for assays of enzyme inhibition has been reported (Park and Clark, 2002). Initial publications in the area of receptor-ligand interaction studies in a microarray format have shown that the interaction of immobilized compounds and proteins in solutions can be determined (Zhu et al., 2001; MacBeath and Schreiber, 2000; Mangold etal., 1999). This technology allows high-throughput screening of ligand-receptor interactions with small sample volumes.

The multiparallel possibilities of protein array applications have the potential to not only allow the optimization of preclinical, toxicological, and clinical studies through better selection and stratification of individuals but also to effect how diagnostics are used in drug development.

Next post:

Previous post: