Biomedical Engineering Reference
In-Depth Information
range of protein abundance in biology samples. Methodologies used in
high-throughput proteomic analysis were discussed in detail in Chapter 6. Mass
spectrometry is a HTP method that provides great separation compared to tradi-
tional approaches and greatly reduced interrun variation, especially when coupled
to HPLC. Here, LC-MS is used as an example to illustrate the procedure and the
issues surrounding the analysis of HTP proteomic data.
The goal of proteomic analysis is the same as that of gene expression microarray
analysis: to identify differentially expressed genes at the protein level. However, the
data are fundamentally different. Data from MS analysis are virtually a list of pairs
of m/z intensity; with LC-MS, each pair also has a third value indicating the time
point (retention time, RT) on HPLC. In LC-MS, the raw data can be viewed as a 2D
gel with one axis for separation by LC (measured in time) and the other for results
from MS (measured in m/z ). Each peak, representing a peptide of a given charge
state, is a group of intensity signals that fit an isotopic distribution along the m/z
axis and a Gaussian distribution along the LC axis. One peptide can appear as sev-
eral individual peaks because of differences in charge state. Major tasks in analyzing
HTP proteomic data are feature (spots or peaks that represent proteins and pep-
tides) identification, feature quantization, feature (peak) alignment, and protein
identification.
The first step in analyzing LC-MS data is background and noise subtraction.
Data filtration (smoothing) may be necessary to facilitate the subsequent step of
peak detection. A common approach in peak detection is to identify peaks in a sin-
gle spectrum and then group similar peaks from adjacent spectra to form the 3D
peak. The peptide peaks vary greatly in size because of their abundance in samples;
this poses a great challenge. Currently available peak detection methods are sensi-
tive to the detection of peaks with a limited range of size and depend on optimal
configuration of detection parameters.
The major challenge in peak alignment lies in RT alignment because of varia-
tions in LC. RT alignment needs to address both time shift and jittering (also
referred as warping) to achieve optimal LC alignment. The dynamic algorithm by
Ono et al. for adjustment of LC jittering appears to be effective and promising [35].
In terms of quantification, a peak is measured by height, area, or volume. Quantifi-
cation of peptides needs to take into consideration peaks of different charge states.
Protein identification is done through tandem MS analysis.
Currently, the major challenge in LC-MS data analysis is the lack of tools for
reliable peak detection, peptide quantification, and peptide matching. Undetected
peaks, inaccurate peptide quantification, and peptide mismatching impose a detri-
mental impact on HTP screening and profiling using LC-MS. One general approach
is to use an automated pipeline for large-scale processing to identify interesting pep-
tides and then come back to manually verify the accuracy of peak identification,
peptide quantification, and peptide matching. Open-source systems available for
LC-MS data processing include OpenMS/TOPP [36] and SpecArray [37].
Two XML standards have been developed for exchange of MS data: mzXML
[38] and mzData [39]. They are designed with different focuses. Currently, effort
has been made to merge the two standards into one. The domain standard is impor-
tant for data exchange among different platforms and analysis tools, and also
makes it convenient to carry along the minimal and adequate experimental infor-
Search WWH ::




Custom Search