Database Reference
In-Depth Information
any processing delay. The objective of this analysis was to establish a list of least
correlated semantic attributes, which contribute toward most data variance. g-SOM
was applied on the selected least correlated attributes from PCA to estimate natural
grouping of the data. This dynamic data analysis provided ranked semantic attri-
butes (according to their importance), which was effectively a valuable recommen-
dation about the whole integrated data set for any future application design. g-SOM
clustering on the integrated preprocessed data was quite useful as this technique
provided a 2D visual map representation of the whole database and natural grouping
of the data attributes. Using this knowledge map (or a region of the map) the user
could design an application or make the decision about which variables to consider,
so the purpose of the SOM on the database was to provide a visual knowledge rec-
ommendations system. Based on the knowledge recommendation the user could also
optimize the Big Data usage by prioritizing and minimizing unwanted data download
and reducing data-processing time. The recommendations from the machine learn-
ing clustering algorithms were also published into the RDF format to represent the
extracted knowledge in a completely machine readable manner and to be able to
interprete programmatically [42,52]. Development of this unique system based on
semantic data integration and machine learning-based data recommendation was the
main achievement of this study. This semantically guided machine learning-based
approach provided a great deal of flexibility in terms of data, knowledge, and prov-
enance integration. Big knowledge integration and recommendation architecture,
based on complementary knowledge integration could provide a generic knowledge
platform for any future environmental decision support application system [22,42].
15.3 DATA TO KNOWLEDGE ARCHITECTURE
Design of knowledge integration architecture was motivated by the fact that none
of the existing data model integration architectures were capable of handling, pro-
cessing, and analyzing multiple large environmental data sources simultaneously.
Database on its own does not carry any weight unless data is converted into knowl-
edge. True data integration is completely dependent on contextual integration of data
sources, where the physical attribute-based parametric integration is complemented
with the semantically matched metadata, information about the purpose of the data,
data usability information, and knowledge recommendation based on unsupervised
data analysis. Designing the architecture focus was mainly on the development of
the architectural capability to get the final outcomes integrated and published with
the LOD cloud. The main purpose of this architecture was to integrate knowledge
from different sources, analyze, and recommend knowledge in a way that could have
the highest possible accessibility on the web, so that the next-generation environmen-
tal application designer could access the recommendation, knowledge, and also the
original data sources programmatically. The design of this architecture had seven
different layers, namely Web Adaptors Layer, Data Harmonization Layer, Semantic
Cross Validation Layer, Feature Representation Layer, SOM-Based Knowledge
Recommendation Layer, RDF Conversion Triple Store implementation Layer, and
LOD Publishing Layer, which are described in details in the following sections
(Figure 15.3).
Search WWH ::




Custom Search