Biology Reference
In-Depth Information
5.2 Automatic
Extraction and
Validation of MIAPE
Information
If MIAPE documents describe what information is to be provided
for a given experiment, the task of the Proteomics data representa-
tion formats as defi ned by HUPO-PSI is to provide a mean to
store this information in a standardized manner. So for example,
mzML data schema [ 22 ] presents all the information related to the
processing of a sample as performed by a mass spectrometer,
including instrument confi guration, acquisition parameters, as well
as the generated data itself, such as mass spectra and chromato-
grams. The mzIdentML data schema [ 23 , 24 ] represents all the
information related to the bioinformatics analysis of mass spectra
that leads to the identifi cation of peptides and proteins, which
means the description of the process in which mass spectra are
assigned to amino acid sequences belonging to protein databases.
This includes all parameters used when submitting spectra to a
search engine and possibly validation tools, as well as the results
themselves, that is, the obtaining peptide and protein lists with
attributed confi dence levels. In addition to PSI standards, other
data formats are aiming at storing relevant information from pro-
teomics experiments. For example, the PRIDE data schema [ 39 , 40 ]
was developed by the European Bioinformatics Institute (EBI) in
parallel to mzML and mzIdentML for the purpose of converting
data and meta-data from proteomics experiments not expressed in
PSI standard formats. PRIDE has become one of the most impor-
tant repositories of protein identifi cation data by mass spectrome-
try, currently containing almost 300 million of spectra, more than
50 million peptides and almost 9 million proteins in its database.
In a near future, the PRIDE team will natively support mzML and
mzIdentML data in their importing workfl ows.
More and more tools and software are capable of reading and/
or writing these standard data formats ( see http://psidev.info/
mzml for mzML and http://psidev.info/mzidentml for mzI-
dentML), including free-tools, open-source tools and commercial
tools. Proteomics data deposition in public repositories is enor-
mously facilitated by these standards, which is required or strongly
recommended to authors when publishing MS data and results in
most of the proteomics journals.
An alignment between data contained by standard fi les and the
information required by MIAPE guidelines is therefore of crucial
importance. In other words, before to store standard fi les in a
repository, it becomes necessary to check if the encoded informa-
tion is MIAPE-compliant or not, and in case of no, it is also neces-
sary to detect which information is missing and to provide the way
to include it. The ProteoRed MIAPE Web Toolkit (PMWTK) [ 41 ]
is able to do it, providing a MIAPE quality stamp to a given fi le
indicating that the experiment is perfectly described and that could
be evaluated and potentially reproduced by other scientists.
Search WWH ::




Custom Search