Biology Reference
In-Depth Information
hand, it is increasingly challenging to extract the
relevant information from the overwhelming
amount of the complex multidimensional data
that are generated. These structures require
adapted methodologies to handle and process
raw data, build models able to summarize this
information in a compact and comprehensible
manner, and highlight biomarkers that are rele-
vant to the biological phenomenon under
study. 17 Although interindividual differences
are intrinsic to biological phenomena, undesir-
able variability and bias may also be added
during data processing. Numerous steps are
crucial for the appropriate processing of these
data sets in order to distinguish relevant
biomarkers related to sound knowledge from
the mass of recorded signals. Furthermore, the
hyphenation of separative methods with MS
produces massive bidimensional data sets with
a retention time and a m/z dimension. Ions corre-
spond to m/z retention time pairs that were
called mass spectral tags or mass features. 8
Such a data structure requires speci
FIGURE
2 Data
preprocessing
and
pretreatment
c data pro-
cessing and several steps are needed to render
the data meaningful and obtain properly aligned
features detected across multiple data sets.
A work
work
ow.
generate centroid data. Each peak is then charac-
terized by a single monoisotopic m/z value that
corresponds to the main peak of its isotopic
cluster, and its intensity, which is computed
from the intensities of the data points. Centroid-
ing allows a strong reduction of the data size
and an easier mass assignment, which usually
increases the data quality. 18
As each instrument constructor has its own
proprietary
ow for data preprocessing and pretreat-
ment is proposed in Figure 2 .
Data Preprocessing
Data processing is a crucial part of the data
work
ow, required to extract the relevant
signals from the raw data prior to data mining
and interpretation. 17
file format, the starting point of MS
data processing is the conversion of the raw
data from the original
It
includes data format
conversion, noise
filtering, normalization, chro-
matographic alignment, peak detection, decon-
volution, and integration. These processes are
discussed in more detail
files to open data format.
This step process allows further processing of
the raw data, independently of the MS provider.
Common open data formats are more or less
compact and include netCDF, mzML, or
mzXML. 18,19 Additionally, the data description
is an important parameter, as it ensures the
comparability, long-term collection, and sharing
of the experimental results. 20,21 The
in the following
subsections.
Centroid Data and Format Conversion
Common data acquisition often involves the
combination of multiple m/z signals correspond-
ing to a given peak into a single data point to
files associ-
ated to a single analysis can still be very large,
Search WWH ::




Custom Search