Biology Reference
In-Depth Information
fed with a list of predefined peptide species and their cor-
responding fragments. It then simply records series of
transitions from precursor to fragment ions; this is referred
to as multiple or selected reaction monitoring (SRM or
MRM) [38] . Both shotgun and targeted approaches have
their advantages and drawbacks: the shotgun approach does
not require prior development of peptide-specific assays
and in principle can measure the entire proteome. There-
fore, it is the method of choice for the discovery phase of
proteomic studies. However, it may require extensive
measurement time and proteins of interest may be missed.
In contrast, the targeted approach can be performed rapidly
and in principle without pre-fractionation, but is necessarily
biased in the sense that only predefined peptides are
measured.
The most promising approach is probably a hybrid one,
which is facilitated by the latest generation of mass spec-
trometric hardware: a combination of general shotgun
sequencing with targeted sequencing of a list of preselected
candidates. Another interesting hybrid approach has been
called SWATH-MS and involves the acquisition of frag-
ment data for all precursor masses in consecutive mass
windows of 25 m/z units (termed 'swaths') across the entire
mass scale in rapid succession. When combined with tar-
geted data extraction, this enables repeated scanning of the
same fragment ion maps for quantification of proteins or
peptides of interest [39] .
Relative or absolute quantification has increasingly
become the focus of proteomics experiments and has
largely replaced the initial goal of only generating accurate
and complete lists of identified proteins [40] . This is
a challenging task because mass spectrometry is not
inherently quantitative. A number of elegant approaches
have been developed that now make MS the most quanti-
tatively accurate protein technology by far;
DNA and RNA. For all of these technologies it is a chal-
lenging task to produce condensed representations of the
data in a form and amount suitable for biologic interpre-
tation in a reasonable timeframe within the constraints of
the available computer hardware. In the early days of MS-
based proteomics, the interpretation of spectra for the
purpose of identification and quantification of peptides and
proteins was done in a manual or semi-automatic fashion
[41] . Nowadays, however, a single mass spectrometer can
generate a million mass spectra per day [42] . Obviously, it
is impractical to interpret the entire raw data in a 'one
spectrum at a time' fashion by a human expert. Therefore, it
is a necessity to employ reliable and efficient computa-
tional workflows for the identification and quantification of
these enormous amounts of spectral data. Of particular
importance is the control of false positives, e.g. by calcu-
lating and enforcing false discovery rates (FDRs) by
statistical methods that take into account the multiple
hypotheses testing nature inherent in large MS datasets.
Historically, computational proteomics started from the
development of peptide search engines, and for this reason
software tools have evolved around them. Furthermore,
vendors strive to provide software enabling the computa-
tional analysis of the output of their instruments. These
often interface with popular peptide search engines. There
is much activity in software development for MS-based
proteomics and dedicated reviews have been published
[43
46] .
All-encompassing end-to-end computational workflow
solutions have also been developed, for instance the freely
available trans-proteomic pipeline [47] and MaxQuant
software packages [48] . MaxQuant contains a comprehen-
sive set of data analysis functionalities and will be the basis
of the subsequent discussion. Furthermore, there is
a plethora of individual solutions for more specialized tasks.
As examples, ProSight assists in the analysis of top-down
protein fragmentation spectra [49,50] , special search
engines have been developed to identify cross-linked
peptides [51,52] , and commercial software for the 'de novo'
interpretation of fragmentation spectra is available [53,54] .
Here we focus on the computational steps that are
needed to generate quantitative protein expression values
from the raw data. Later chapters in this topic focus on
subsequent analysis of this kind of expression data in terms
of multivariate data analysis, in the context of biomolecular
interaction networks or in the modeling of biochemical
reaction pathways. This initial part of the shotgun proteo-
mics data analysis pipeline can roughly be subdivided into
four main components ( Figure 1.2 ): (a) feature detection
and processing, (b) peptide identification, (c) protein
identification and (d) quantification. Each of these consists
of several sub-tasks, some of which are obligatory
constituents of the generic data analysis workflow whereas
others address specific questions in particular datasets.
e
these are
summarized in Box 1.1 .
The correct identification and quantification of peptides
by MS/MS sequencing, the assembly of a series of peptide
sequences into protein identifications and the integration of
peptide quantification into protein quantification becomes
increasingly challenging as the complexity of the sample
increases. It can only be dealt with correctly using rigorous
statistical methods. To this end, a plethora of software tools
and mass spectra search engines have been developed,
which are discussed in the next section.
COMPUTATIONAL PROTEOMICS
An important aspect of high-throughput technologies is the
availability of suitable computational workflows support-
ing the analysis and interpretation of the large-scale data-
sets that are routinely generated in current systems biology.
Modern MS-based proteomics measurements produce data
at similar rates as deep sequencing experiments of cellular
Search WWH ::




Custom Search