Biomedical Engineering Reference
In-Depth Information
information content. Chemometrics is the science of extracting the relevant information
that is coming from various chemical sensors and analyzers by applying advanced
mathematical and statistical algorithms. It is commonly employed especially when
spectral measurement systems are used in a given process. For instance, in a bioreactor
setting where off-line or online samples are taken for conducting an NIR and/or HPLC
(or an off-gas GC analysis) in certain cultivation periods (e.g., every 12 or 24 h), resulting
data set is fairly complex such that it requires pretreatment (such as normalizing, scaling,
denoising, and outlier removal) and its relevant information to be extracted at each time
point of sampling (e.g., NIR wave numbers that correlate with certain metabolites). This
type of spectral data sets may be noisy—there may be missing data points and outliers—
and highly collinear, which makes conventional analytical techniques (such as multiple
ordinary linear regression) inappropriate to explain these data (conventional techniques
become mathematically instable when presented with this type of data). Chemometrics
tools include, but are not limited to, principal components analysis (PCA), partial least
squares (PLS), continuum regression, evolving factor analysis, and principal compo-
nents regression (PCR). PCA and PLS are by far the most used techniques for practical
applications and are now available via commercial off-the-shelf (COTS) software.
Another major advantage of using chemometrics techniques such as PCA and PLS is
that while explaining the complex data sets, these techniques mathematically reduce the
dimensionality of the system so that only the relevant information is extracted by using a
few variables instead of many hundreds or thousands. PCA does this by finding the major
variation directions and PLS by also correlating these directions with certain output
(response) variables that are thought to be impacted by many inputs. A short summary of
both techniques is provided in the following sections.
12.7 THE POWER OF PLS AND PCA
A large number of variables, interactions among the variables, and complex process
dynamics (such as biological dynamic behavior) pose challenges against analyzing the
process data involved in the analysis. Multivariate data modeling technology is used to
reduce dimensionality of the problem as well as handling colinearity and missing data.
When only the input space or output space is to be analyzed or just the major variability
differences in a given process or system is of concern (not necessarily the input-output
relations), PCA is employed. PCA decomposes the covariance matrix of the data set (in a
linear fashion) to explain the maximum variability and remove noise in the process.
Therefore, it finds the major variability directions within the data set to explain the
overall system behavior with a few components as shown in Fig. 12.7. In this example,
measurements are made on three variables for many times (e.g., across batches, repeats,
or multiple spectra) depicted by the points that form a data volume or cloud stretched
toward some directions based on the process or system characteristics and measurement
systems variability. PCA finds the first major variability direction (also called principal
component one, PC 1 ), removes this explained variation from the data set (working on the
residuals after the first iteration), and looks for the second major variability direction
(PC 2 ). As shown in Fig. 12.7, a three-variable process can be explained by one or two
Search WWH ::




Custom Search