Digital Signal Processing Reference
In-Depth Information
In a first normalization step, we scale and translate x such that
the two density maxima of x - corresponding to the gray background
and the dark cell color - are mapped onto two fixed values. Subtracting
the mean x
E ( x ) then ensures that the data set is centered.
In order to reduce dimension as well as to decorrelate the data in a
first separation step, we apply principal component analysis (PCA) , i.e.
linearly transform the random vector x in order to decorrelate it and
also to reduce its dimension by projecting along the largest eigenvectors
(principal axes) of the correlation matrix of x ,seechapter3.
When analyzing the eigenvalue structure of the training set covari-
ance, we note that by taking only the first five eigenvalues, projection
along those first five principal axes still retains 95% of the data. Thus,
the 400-dimensional data space is reduced to a whitened five-dimensional
data set. A visualization of the 120-sample data set is given in figure 13.5,
after projection to three dimensions. One can easily see the cell and non
cell components can be linearly separated - thus using a perceptron (see
later) can indeed already learn the cell classifier. Furthermore, a k-means
clustering algorithm has been applied with k = 2 in order to find the two
data clusters. They correspond directly to the cell/non cell components
(see figure 13.5).
The above result also indicates that unsupervised learning algo-
rithms can produce a meaningful approximation of a cell classifier. We
will confirm this by successful application of independent component
analysis (ICA) . In ICA, given an (observed) random vector, the goal
is to find its statistically independent components. This can be used to
solve the blind source separation (BSS) problem, which is, given only
the mixtures of some underlying independent sources, to separate the
mixed signals and thus recovering the original sources. In contrast to
correlation-based transformations such as PCA, ICA renders the output
signals as statistically independent as possible by evaluating higher-order
statistics. The idea of ICA was first expressed by Herault and Jutten
[112] while the term ICA was later coined of Comon in [59]. In the cal-
culations we used the well-known FastICA algorithm [123] by Hyvarinen
and Oja, which separates the signals by using negentropy, and therefore
non-Gaussianity, as a measure of the signal separation quality.
Figure 13.6 is a plot of the linearly separated signals together with
the cell/non cell function for comparison. The fifth component is highly
correlated ( cc =0 . 9) with the desired output function, so instead of
x
Search WWH ::




Custom Search