Database Reference
In-Depth Information
In quantitative shotgun proteomics, each protein abundance ratio is esti-
mated by combining multiple peptide abundance ratios and often using the
average value. The standard deviations of peptide abundance ratios are used
to measure the variation of protein abundance ratio estimates. However, with-
out assuming the normality of the peptide abundance ratio distribution, the
standard deviation is not directly related to the confidence interval of the
protein abundance ratio.
We devised a profile likelihood algorithm to infer the abundance ratios of
proteins from the abundance ratios of isotopically labeled peptides. Given
multiple quantified peptides for a protein, the profile likelihood algorithm
probabilistically weighs the peptide abundance ratios by their inferred esti-
mation variability, accounts for their expected estimation bias, and suppresses
contribution from outliers (Figure 8.14). This algorithm yields maximum like-
lihood point estimation and profile likelihood confidence interval estimation
of protein abundance ratios. This point estimator is more accurate than an
estimator based on the average of peptide abundance ratios. The confidence
interval estimation provides an error bar for each protein abundance ratio
that reflects its estimation precision and statistical uncertainty. The profile
likelihood algorithm not only showed more accurate protein quantification
and better coverage than the widely used programs (e.g., RelEx) but also
a more robust estimate of a confidence interval for each differential protein
expression ratio.
8.7 Summary
In this chapter, we described the application of various data analysis algo-
rithms to find useful information in datasets from several different scien-
tific domains, ranging from biology to materials science and cheminformatics.
Though these domains, and the problems being solved, are very different, we
observe several similarities. For example, the presence of noise in the data
is frequently an issue as it can affect the analysis done using the data. The
representation of the objects in the data is also important—if the represen-
tation captures the key features that are critical to the analysis problem at
hand, the results of the analysis can be improved. We also saw that some
techniques, such as principal component analysis, are used in several different
domains, including biology and materials science. And finally, it was observed
that analysis in scientific domains is not just the application of statistical or
data mining algorithms, but a careful integration of such techniques with do-
main expertise, along with a careful and deliberate process of understanding
the data input to the algorithms and interpreting the patterns found in the
data.
Search WWH ::




Custom Search