Biology Reference
In-Depth Information
of utilizing several different raw data
file formats,
cation
can be composed of amino acid or nucleotide
sequences that are translated in protein
sequences. The number of entries within these
databases has grown exponentially over the
years in correlation with the speed at which
genome sequencing has increased. Although
there are many publicly available sequence data-
bases, most experimental MS data is analyzed
against one of the following three databases:
the UniProt knowledgebase (UniProtKB), the
NCBI nonredundant (NCBI nr) protein database,
and the International Protein Index (IPI) data-
base. These databases are regularly updated to
provide proteomic investigators access to the
latest available sequence information.
The databases used for protein identi
including .dta
files. Fortunately, there are scripts
available for converting data from a variety of
different mass spectrometer instruments into
a .dta format. A list of other available software
programs for analyzing MS 2 data is provided in
Table 2 .
Protein Databases
Beyond a mass spectrometer and software for
turning raw MS data into peptide identi
cations,
a protein database in which to search for the data
is also required. One of the
first questions asked
of an investigator requesting protein identi
ca-
tion is the type of species the sample was taken
from. The question is asked because the MS
(and MS 2 ) data needs to be analyzed against
a database containing the protein sequences
from that particular species. Although ideally,
proteomics will be able to identify MS 2 spectra
using de novo sequencing, this step will require
several more years of development before it
becomes mainstream.
Top-Down Mass Spectrometry
As described earlier, most proteins identi
ed
using a MS approach utilize a bottom-up
approach in which the proteins are digested
into peptides and the identi
cation of
these
peptides are used as surrogates for identi
cation
of proteins. The major de
ciency of this strategy
is that it does not provide the direct evidence of
important biological parameters such as alterna-
tive splice forms, diverse modi
TABLE 2 Software Available for Protein Identi
cation
by Analysis of Tandem Mass Spectrometry
Data
cations, and
variant sites of protein cleavages. Consider
a protein that has the potential to be phosphory-
lated.Within the cell, it will likely exist inmultiple
forms containing 0, 1, ormultiple phosphorylated
residues.
Software Program
URL
Sequest
http://www.thermo.com
ed using
a bottom-up approach, it would be impossible
to assign each protein with its correct modi
If
this protein is identi
Mascot
http://www.matrixscience.com
MS-Tag
http://prospector.ucsf.edu
ca-
tions with 100% certainty. For example, c-MET
possesses multiple potential phosphorylation
sites. Phosphorylation of a single site may cause
protein activation, and phosphorylation of
multiple sites may cause the protein to become
deactivated. 6 The activation state of c-Met is
a major indicator of how a patient with small
cell lung carcinoma will respond to treatment;
therefore, knowing the relative amounts of the
various phosphorylated versions of this protein
Pep-Frag
http://prowl.rockefeller.edu
OMSSA
http://pubchem.ncbi.nlm.nih.
gov/omssa
Sonar MS/MS
http://hs2.proteome.ca/prowl/
sonar/sonar_cntrl.html
X!Tandem
http://www.thegpm.org/
tandem
Crux
http://noble.gs.washington.edu/
proj/crux
Search WWH ::




Custom Search