Biomedical Engineering Reference
In-Depth Information
10.1.Long-Term
Storage
A laboratory engaging in LC-MS/MS analysis of complex sam-
ples can generate a lot of data over the lifetime of an instrument.
Many organizations have data retention policies that require the
storage of the MS data from three to five years or occasionally
indefinitely. Although this is not strictly a bioinformatics problem,
it is something that a laboratory has to be aware of and needs to
budget for. Fortunately, every year the cost of storage decreases
and the capacity increases. Computers equipped with large RAID
or dedicated network-attached storage systems are the preferred
choice for long-term data storage and can easily store between
1 and 10 TB of data. However, it should be stressed that these
systems are not infallible as multiple disk failures can coincide or
the RAID controller hardware can fail potentially corrupting the
data. Therefore backing up to offline tape storage or removable
hard disks as well are highly recommended.
10.2.RawData
Standards
Because the raw data formats used by the manufacturers are not
interchangeable, a number of open standards have been devised
by the proteomics community. mzData was the first format that
was initiated by the HUPO/PSI and designed by committee
over a 2-year period. While the mzData format was in develop-
ment, the mzXML format was quickly developed by the Insti-
tute of Systems Biology for use in the TPP suite of applications
( Table 4.9 ) . Finally, a revised format that merged the mzData
and mzXML formats, called mzML, was released in June 2008.
The mzML format was developed by a working group consisting
of the HUPO/PSI committee, SPC/ISB, instrument vendors,
and other proteomics software groups. There are a number of
FOSS and commercial applications available that can convert raw
data from all the popular instruments into mzData, mzXML, and
mzML formats. However, many of these tools still rely on soft-
ware libraries that are a part of the MS vendor software. There is a
free viewer available from Insilicos that can read and display data
in all three open formats. If the mzData, mzXML, and mzML for-
matted data files contain profile data, they still need to undergo
peak picking before database searching.
In a similar fashion to the raw data standards, there are multi-
ple standards for database search results. The HUPO/PSI stan-
dard mzidentML (née AnalysisXML) is currently in review and
will start to be supported by applications from the summer of
2009 onward. mzidentML conforms to both the minimum infor-
mation about a proteomics experiment (MIAPE) ( 73 ) andthe
MCP guidelines ( 35 ) . While the HUPO/PSI standard was being
designed, the pepXML and protXML standards were developed
as part of TPP ( Table 4.9 ) . PepXML can be used only for
MS/MS data and includes only the “raw” peptide match data;
10.3.Experiment
ResultStandards
Search WWH ::




Custom Search