Neural computing in pharmaceutical products and process development - Computer-Aided Applications in Pharmaceutical Technology

Information Technology Reference

In-Depth Information

even construction of clusters with a single member. Selection of the

SOM based topology prior to training does not mean that all clusters

will be populated after training. The most common way of SOM

organization is a square grid, although hexagonal or higher dimension

grids are also used.

Together with the trained SOM representing distribution of the data, it

is important to assess the so-called U-matrix (unifi ed distance matrix),

which visualizes distance between neighboring neurons in the SOM. It is

basically a two-dimensional representation of the elastic SOM grids. By

assessing the U-matrix, it is possible to visualize the spatial distribution

of data clusters, that is, to detect, for example, regions of the map that

are either highly populated with the data or regions where there are

almost no data. Mapping techniques allow analysis of complex

multidimensional data in an intuitively comprehensible visual manner

(Ivanenkov et al., 2009). The color scale of SOMs is built up according

to original values of high-dimensional data.

Figure 5.15 is an example of SOMs developed for three-dimensional

space (three variables: Var1, Var2, and Var3), together with the U-matrix,

which represents two data clusters (blue color) that are separated by

an empty region (red color). The color scale next to the U-matrix

indicates that the blue color is used to mark the short distance between

map cells (therefore these clusters are more densely populated), whereas

the red color is used for large distances. Elastic bonds of neurons are

spreading through the red colored empty region. If we take a look

at other maps, we can see that three-dimensional space is reduced to

two-dimensional space, but with preserved topological features. This

means that map cells are uniformly distributed, such that each sample

has the same position on each of the maps. The top left corner cell,

for example, is one sample and the Var1, Var2, and Var3 values can be

read from the maps. The color scale next to the maps indicates numeric

or discrete values of samples (data normalization is often required). If

we analyze data clusters in each of three SOMs, we can observe that

the upper cluster has the highest values of all the variables (Var1, Var2,

and Var3).

As previously mentioned, vectors of the SOM neurons m i have

traditionally been randomly initialized. It was supposed that SOM neurons

have a strong tendency to self-organize, so that the order can emerge even

when starting from a disordered state. During time, it was demonstrated

that computation of SOMs can be much faster if the initial values of

neuron vectors are selected on the basis of selection of values corres-

ponding to the largest principal components of the presented data set.

Search WWH ::

Custom Search

Home