Biomedical Engineering Reference
In-Depth Information
depend on the problem at hand and optimality of final result is not always
guaranteed, but GA-based methods have been proposed to be superior to re-
gression approaches [139]. In this sense, they are similar to ANNs. However,
GAs give explicit information about important and unimportant features as
feature selection is integrated into the method [140]. Goodacre et al. [141-144]
have pioneered the use of GAs for spectroscopic analyses and their group and
others have reported numerous applications [145, 146].
8.3.8 Unsupervised Learning
In unsupervised classification there is no explicit knowledge of the categories to
which the data belong. The system forms clusters or natural groupings of input
data. “Natural” here is defined explicitly or implicitly in the clustering system
itself. It must be noted that, for a given data set and categories, different
clustering algorithms could lead to different clusters. Hence, clustering is best
used as a visualization and discovery tool whose results must be carefully
examined and validated. In contrast to supervised methods in which an initial
knowledge of the data is required, a secondary source of knowledge is required
to validate results in unsupervised methods.
8.3.9 k-Means Clustering
k -Means clustering (Fig. 8.9) [147, 148] is one of the simplest unsupervised
learning algorithms that aggregate the observed data into classes or clusters
fixed a priori. The main idea is to define k centroids, one for each cluster, that
act as aggregating markers for data. The next step is to take each data point
and associate it with the nearest centroid. When no point is pending, the first
step is completed and an early grouping is done. At this point we need to
re-calculate k new centroids as barycenters of the clusters resulting from the
previous step. After k new centroids, a new association is obtained between
the same data set points and the nearest new centroid. As a result of this
loop, k centroids change their location step by step until no further changes
are observed. The data can now be understood in terms of centroids. k -Means
is also a good exploratory tool when data form natural groups. It usually
converges rapidly and is easy to implement. The number of classes ( k ) must
be known or a reasonable guess must be possible to evaluate a small number
of models. The initial points, however, can be very important to final outcome
and may give rise to misleading clusters when there are no natural clusters.
The choice of spectral information used is also critical and extensive pre-
processing is often required, including dimensionality reduction. There is also
the possibility of getting stuck in an infinite loop and implementations counter
this possibility with a termination condition. Data would usually require a
second examination before reaching conclusions. The ease of implementation
and rapid application is especially suited to imaging applications in which
classes can be easily visualized using different colors and unnatural clusters can
Search WWH ::




Custom Search