Biology Reference
In-Depth Information
need to be calculated, which requires quantification of the
identified peptides ( Box 1.1 and Figure 1.2 D). In stable
isotope labeling approaches that produce pairs or higher
multiples of peptide isotope patterns in the MS spectra, one
can use algorithms that provide very precise estimates of
peptide abundance ratios. In MaxQuant this is done by
comparing the full elution profiles and isotope patterns of
the labeled partners. Once peptide ratios are calculated they
need to be combined in appropriate ways to obtain protein
ratios. In isobaric labeling techniques the relative peptide
abundances are read out at specific mass values in the MS/
MS spectra [75] . Here special attention needs to be devoted
to the distortion of the signal by co-fragmented peptides
and to filtering the peptides accordingly for quantification
[76
a few years ago many thought that this problem would be
unsolvable even in principle [79] . Fortunately, it has now
become clear that the dramatic improvements in the pro-
teomic workflow do indeed allow complete characteriza-
tion of proteomes.
Like its genome, the proteome of the yeast model
system was the first to be completely analyzed [2] . Haploid
and diploid yeasts were SILAC-labeled, mixed and
measured together. With a combination of different
approaches, 4400 yeast proteins were identified with 99%
confidence, a larger number than detected either by
genome-wide TAP (tandem affinity purification) tagging or
GFP (green fluorescent protein) tagging of all yeast open
reading frames [11,80] . The most regulated genes belonged
to the yeast mating pathway, most of which are expressed at
very low levels and are only functionally relevant in
haploid yeast. However, not all members of this pathway
were differentially regulated, immediately highlighting that
they must have additional roles in other cellular processes.
The total dynamic range of the yeast proteome under these
basal conditions turned out to be between 10 4 and 10 5 .
A targeted analysis of the yeast proteome likewise
identified proteins across its entire dynamic range [81] .
SRM assays were developed on triple quadrupole instru-
ments for members of the glycolysis pathway, and
expression changes upon metabolic shifts were measured
across multiple time points in relatively short LC-MS runs.
Recently, our group has proposed 'single-shot proteo-
mics' as a complement to the shotgun and targeted
approaches: single-shot proteomics simply means the
analysis of as much of the proteome as possible by a single
LC MS/MS run [82] . Its attractions are that sample
consumption and measurement times are very low, while
still preserving the large-scale, unbiased and systems
biology character of the measurement. Employing recent
advances in chromatography, mass spectrometry and bio-
informatics, the yeast proteome can now be covered
almost completely in this mode. This was illustrated by
investigating the heat-shock response of the yeast pro-
teome in quadruplicate measurements with nearly
complete coverage and with about a day of
78] . Finally, samples can be measured without
isotopic labeling, which is referred to as 'label-free quan-
tification'. In this case optimal alignment of the runs should
be performed, and further normalization steps should be
included to make peptide signals from different LC-MS
runs comparable to each other. This is computationally
challenging, in particular if the samples are each pre-frac-
tionated into several LC-MS runs.
In addition to the basic workflow described so far,
which provides quantitative protein expression data,
several additional downstream computational tasks need to
be performed. Fortunately, once the proteomic expression
data matrix has been obtained, many statistical and
computational methods that were developed for microarray
data analysis can be re-used for proteomics. For instance,
clustering, principal component analysis, tests for differ-
ential regulation, time series, pathway and ontology
enrichment analysis and many other methods can be
applied just as well to proteomics data. The Perseus module
of MaxQuant assembles these capabilities into a single,
user-friendly platform for high-resolution proteomic data.
Modeling in systems biology has so far relied on either
mRNA levels as proxies for protein expression or on small-
scale protein data that monitored only a few different
molecular species. In the future, modeling will surely
benefit from the increasing availability of large-scale and
precise proteomics data.
e
total
measurement time [83] .
The human proteome is more complex than the yeast
proteome ( Figure 1.3 A), but until very recently it was
unknown how many different proteins a single cell line
actually expresses. Using deep shotgun proteomics
approaches two different human cancer cell lines have
recently been investigated in depth by MS-based proteomics
[3,5] . Both studies found that such cell lines contain at least
10 000 different proteins. Saturation analysis [5] or
comparison to deep RNA-seq data [3] suggested that this
number is not very far from the total number of expressed
proteins with functional roles in these cells. A subsequent
study of 11 commonly used cell
DEEP EXPRESSION PROTEOMICS
One of the limitations of proteomics so far has been its
inability to probe the proteome in great depth. Over the last
few decades, 2D gel electrophoresis, for instance, has
produced gels that visualized hundreds or thousands of
spots. Upon identification, however, they generally proved
to derive from a very small number of highly expressed
genes. The difficulties in exploring the proteome in depth
are mostly related to the 'dynamic range problem', that is,
the difficulty of measuring extremely low abundance
proteins in the presence of very high abundant ones. Until
lines also identified
Search WWH ::




Custom Search