Biology Reference
In-Depth Information
extracted from the sample one by one, the first component being
the one which explains most of the systematic variation in the data.
The plotting of principal components quickly visualizes the struc-
ture of the data, helping to find sample clusters and identify outliers.
If some groups are distinguished, it is always interesting deep into
the analysis and study which variables are most correlated to each
principal component for defining the biological processes that are
hidden in the data. The correlation of each variable to each com-
ponent is included into the so called loading matrix. These analyses
can be performed in R using prcomp{stats}, princomp{statps}, and
biplot{stats}. Here we recommend start the processing of the data
with a PCA for quickly define outliers, and later on the analysis,
when the outliers are removed, perform a complete PCA analysis
defining also the biologically interesting variables.
In contrast to PCA, Independent Component Analysis (ICA)
decomposes an input dataset into components so that each com-
ponent is statistically as independent from the others as possible.
ICA can be used to extract mixed signals from the datasets while
reducing the effects of noise or artifacts. ICA proved to be more
powerful than PCA and faster and more robust than ANOVA
dealing with proteomics data [ 14 , 15 ]. In R , {FastICA} package is
recommended.
3.8.2 Independent
Component Analysis
This is a multivariate projection-based method that, unlike PCA or
ICA, maximizes the covariance between two datasets by seeking
for linear combinations of the variables from both sets (these linear
combinations are called the latent variable). In a classical partial
least squares, discriminant analysis (PLS-DA) the response variable
is categorical, indicating the different classes (treatments) of the
samples, which are used to solve a wide range of classification/
discrimination problems in a supervised way determining which
variables shows a higher covariance with the different treatments.
{Mixomics} package contains a set of tools for performing PCA,
PLS and other multivariate tests focused on -omics data [ 16 ].
3.8.3 Partial Least
Squares, Discriminant
Analysis
Clustering of expression data is usually done to identify proteins
with similar behavior, implying that they are correlated. This
exploratory technique has clearly proven valuable, and is comple-
mentary to multivariate statistics. The representation of the differ-
ent pathways and visualization of the integrated data across time
series or treatment can improve the data interpretation, being also
sometimes helpful to select candidate variables. The use of Pearson's
correlation coefficient and Ward's aggregation method is the best
clustering strategy for proteomics data, being Euclidean distance
and UPGMA another valid strategy [ 17 ]. R package {gplots} can
be used for plotting these graphs.
3.8.4 Clustering
and Heat Maps
Search WWH ::




Custom Search