Database Reference
In-Depth Information
Trained SOM network [5 × 2]
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.0
0 0
0.1
0.2
0.3
0.4 0.5
Weight 1
0.6
0.7
0.8
0.9
1
FIGURE 15.14
Trained SOM network.
integrated data set (Figure 15.14). The usage of PCA to define the design of SOM
network selection is unique and completely dynamic. Depending on the data quality
and variance, the size of the selected SOM is dynamically varied. This design made
the SOM training completely adaptable to the dynamic nature of available integrated
data. As the SOM was trained with whole data matrix containing 40 input variables,
SOM had 40 internal weights that were updated iteratively. After a certain number of
iterative training, the weights of the partially trained SOM could be used to visualize
as 2D visual patterns or weight maps. SOM weight planes were used in the training
window to obtain these maps. There was a weight plane (or visual 2D pattern map)
for each element of the input vector (40, in this case). They were visualizations of the
weights that connect each input to each of the neurons. This SOM-based analysis and
visualization provided a unique node connection pattern representation about the
input space. Lighter and darker colors represent larger and smaller weights, respec-
tively. If the connection color patterns of two inputs are very similar, it is estimated
that these two inputs are highly correlated. Highly correlated attributes are almost
similar in terms of data variance contributions, so unless both of these variables are
required for a specific application, one of them can easily be ignored and reduce the
problem dimensionality and to increase the overall data variance. The novelty about
this SOM-based approach is that it provides visual patterns about all the variables,
so database customers can easily understand and decide about the most important
variables with simple visual inspection without even accessing any data. The SOM-
based visual representation method provides a great dynamic way to recommend
knowledge about the Big Data. SOM clustering on the integrated preprocessed data
is quite useful as this technique provided a 2D visual map representation of the
whole database and natural grouping of the data attributes (Figure 15.15). Using
this knowledge map (or a region of the map), a user could design an application, so
Search WWH ::




Custom Search