Biomedical Engineering Reference
In-Depth Information
Two major categories of methods can be distinguished regardless of the linear/
non-linear character of the technique used. In the supervised mode, a 'known'
solution is assumed to exist and examples of data representing various classes are
used during model development to establish classification boundaries between
these. On the other hand, when no underlining classification is assumed or known,
a clustering of data features is identified on the basis of a defined similarity in data
characteristics. In fact, the distinction between classification (supervised pattern
recognition) and clustering (unsupervised) is often not fully recognised [ 12 ].
Only a selection of the most frequently used methods of clustering and clas-
sification with relevant bioprocessing examples can be discussed in this chapter,
but an extensive range of literature is available covering various aspects of clus-
tering and classification in the biosciences [ 12 , 54 ].
3.1 Application Areas in Biosciences and Bioprocessing
Exploratory data analysis and clustering are widely applied in biosciences for a
range of tasks. Probably the most widely described are the bioinformatics tools used
for identifying patterns in gene expression data under various conditions, whether in
medical applications for identifying biomarkers of particular diseases or for bio-
synthesis of particular products or taxonomic studies of biodiversity and evolution of
microorganisms [ 25 , 51 ]. Various clustering methods, ranging from hierarchical
clustering and k-means clustering to soft clustering methods, fuzzy c-means and
their variations [ 19 ], rely on the specification of a 'similarity' measure or distance
metric that is used to assess whether two data points are sufficiently similar to be
assigned to the same class of objects. These metrics significantly affect the resulting
clustering, and given that the 'correct' clustering is not known in these applications,
the biological plausibility of the resulting cluster structure is typically used to assess
the effectiveness of the analysis. In bioprocessing, the situation is often simplified
when analysing historical data, for which clusters can be user-defined. The cluster
boundaries could be based on, for example, high/low final productivity or nominal
behaviour/deviation, depending on the application [ 9 ].
The extension of the use of MVDA tools to the analysis of the remaining
'omics' data is an expected development in data analysis [ 17 ]. Although tech-
niques and case studies currently more directly associated with bioprocess mon-
itoring and control will be discussed in more detail below, it is important to point
out that 'omics' data are increasingly used in combination with more traditional
process monitoring [ 49 ], and thus MVDA methods used for the interpretation of
such data are becoming more important from the bioprocessing point of view. The
most frequently reported applications of MVDA will be highlighted with each
technique in the following subsections.
Search WWH ::




Custom Search