Biomedical Engineering Reference
In-Depth Information
standard normal variate (SNV) [ 38 ]. Frequently a range of data pre-processing
methods are compared and their effectiveness for a particular application is
assessed by the outcome of the classification or modelling task required.
2.1 Data Characteristics
The introduction section indicated how complex the data structures collected
during bioprocess operation can be. Figure 1 illustrates this, albeit in a somewhat
simplified manner, by including data from raw material quality assessment through
to downstream process monitoring.
In order to gain maximum benefit from data analysis and modelling, the quality
data on raw materials, monitored over time and often using several (multianalyte)
sensors, will need to be linked with quality data monitored during the batch/fed-
batch cultivation at various frequencies for various quality attributes and merged
with online data available from both the cultivation and downstream processing
unit operations.
The varying frequency of sampling and issues with missing, inaccurate and
noisy data, often with significantly varying means and ranges of individual process
variables, often require significant data pre-processing before any meaningful data
interpretation and modelling can be carried out.
2.2 Data Scaling
Brereton [ 6 ] offers an extensive description of various data scaling approaches,
ranging from single measurement transformations (which should not be required
frequently) to scaling individual variables over all samples or individual samples
over all variables. Various transformations, such as logarithmic or power trans-
formations, are well established and used in a range of applications. Equally, there
are various methods of scaling, from simple mean or weighted centring for
applications with varying numbers of samples from different populations to
standardisation (or normalisation or autoscaling, as this approach is often referred
to). Whilst centring scaling simply aligns the means of all the variables without
scaling their ranges, standardisation ensures, as a result of mean centring and
dividing by the standard deviation, that each variable has a similar influence upon
the resulting model. Alternative methods of adjusting the range of individual
variables prior to the analysis and modelling stage include scaling within a
particular range (usually between 0 and 1 by using the maximum and minimum
values of each of the variables) or block scaling and weighting, which is partic-
ularly useful in applications where data from various analytical techniques, such as
NIR spectra, are combined with a process data measurement matrix of much
smaller dimensions.
Search WWH ::




Custom Search