Biology Reference
In-Depth Information
3.6 Database-
Dependent Peptide
Spectrum Assignment
After acquisition of the tandem mass spectrometry data, the spectra
need to be assigned to peptide sequences. For Arabidopsis with its
well-annotated genome sequence released by TAIR ( www.arabidopsis.
org ) , this is usually done in a protein sequence database search with
a search algorithm, which assigns a peptide sequence to the mea-
sured pattern of mass-to-charge values of the peptide fragments.
Even though the general principle of this assignment is the same, a
variety of search algorithms with different scoring schemes exist
[ 10 ]. To control the quality of the search and to decide on a suit-
able score cutoff, the database searches are best performed against
the Arabidopsis protein sequence database expanded by a concate-
nated decoy database [ 11 ]. The decoy database must have the same
elemental composition and the same size as the target database.
This way, the number of spectrum assignments in the decoy data-
base allows assessing the spectrum false discovery rate in the dataset,
or in subsets of the data applying local false discovery rate calcula-
tions. Upon defining the database search parameters, the search
space should be restricted to parameters necessary for peptide iden-
tification because large search spaces will lead to lower scores and a
decreased number of identifications. This issue mainly concerns the
inclusion of posttranslational modifications as variable modifica-
tions in database searches. The general recommendation here is
therefore to include low-abundance posttranslational modifications
in database searches only if the corresponding modified peptides
have been enriched. Alternatively, search algorithms such as
PepSplice may be used, which carefully control the search space
[ 12 ]. However, we recommend that the unexperienced research
laboratory applies standard search tools such as Sequest [ 13 ] or
Mascot (Matrix Science, www.matrixscience.com ) .
When analyzing and interpreting the search results of a mass spec-
trometry experiment, different questions may be asked from the
data depending on the scientific question of the experiment. For
complex experimental setups with different samples and biological
replicates, multiple measurements for each fraction, one to several
spectrum assignments for the same peptide sequence, posttransla-
tional modifications at different peptide positions, and one to sev-
eral peptide sequences for one protein require integration of the
data in a relational database. We have therefore developed the pep-
2pro database and employed its capacity for the analysis of several
large-scale high-throughput proteomics data [ 14 , 15 ]. To make
this analysis pipeline accessible to users, the pep2pro4all system has
been developed, which consists of a database schema and a script
that will populate the database with mass spectrometry data pro-
vided in mzIdentML format [ 16 ]. Thus, the database scheme of
pep2pro can be individually used for tailored data analysis. The
scheme is available from www.pep2pro4all.ethz.ch .
3.7 Working with
Database Search
Results
Search WWH ::




Custom Search