Chemometric Methods for Biomedical Raman Spectroscopy and Imaging - Emerging Raman Applications and Techniques in Biomedical and Pharmaceutical Fields

Biomedical Engineering Reference

In-Depth Information

depend on the problem at hand and optimality of final result is not always

guaranteed, but GA-based methods have been proposed to be superior to re-

gression approaches [139]. In this sense, they are similar to ANNs. However,

GAs give explicit information about important and unimportant features as

feature selection is integrated into the method [140]. Goodacre et al. [141-144]

have pioneered the use of GAs for spectroscopic analyses and their group and

others have reported numerous applications [145, 146].

8.3.8 Unsupervised Learning

In unsupervised classification there is no explicit knowledge of the categories to

which the data belong. The system forms clusters or natural groupings of input

data. “Natural” here is defined explicitly or implicitly in the clustering system

itself. It must be noted that, for a given data set and categories, different

clustering algorithms could lead to different clusters. Hence, clustering is best

used as a visualization and discovery tool whose results must be carefully

examined and validated. In contrast to supervised methods in which an initial

knowledge of the data is required, a secondary source of knowledge is required

to validate results in unsupervised methods.

8.3.9 k-Means Clustering

k -Means clustering (Fig. 8.9) [147, 148] is one of the simplest unsupervised

learning algorithms that aggregate the observed data into classes or clusters

fixed a priori. The main idea is to define k centroids, one for each cluster, that

act as aggregating markers for data. The next step is to take each data point

and associate it with the nearest centroid. When no point is pending, the first

step is completed and an early grouping is done. At this point we need to

re-calculate k new centroids as barycenters of the clusters resulting from the

previous step. After k new centroids, a new association is obtained between

the same data set points and the nearest new centroid. As a result of this

loop, k centroids change their location step by step until no further changes

are observed. The data can now be understood in terms of centroids. k -Means

is also a good exploratory tool when data form natural groups. It usually

converges rapidly and is easy to implement. The number of classes ( k ) must

be known or a reasonable guess must be possible to evaluate a small number

of models. The initial points, however, can be very important to final outcome

and may give rise to misleading clusters when there are no natural clusters.

The choice of spectral information used is also critical and extensive pre-

processing is often required, including dimensionality reduction. There is also

the possibility of getting stuck in an infinite loop and implementations counter

this possibility with a termination condition. Data would usually require a

second examination before reaching conclusions. The ease of implementation

and rapid application is especially suited to imaging applications in which

classes can be easily visualized using different colors and unnatural clusters can

Search WWH ::

Custom Search

Home