Recommending Environmental Big Data Using Semantically Guided Machine Learning - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

any processing delay. The objective of this analysis was to establish a list of least

correlated semantic attributes, which contribute toward most data variance. g-SOM

was applied on the selected least correlated attributes from PCA to estimate natural

grouping of the data. This dynamic data analysis provided ranked semantic attri-

butes (according to their importance), which was effectively a valuable recommen-

dation about the whole integrated data set for any future application design. g-SOM

clustering on the integrated preprocessed data was quite useful as this technique

provided a 2D visual map representation of the whole database and natural grouping

of the data attributes. Using this knowledge map (or a region of the map) the user

could design an application or make the decision about which variables to consider,

so the purpose of the SOM on the database was to provide a visual knowledge rec-

ommendations system. Based on the knowledge recommendation the user could also

optimize the Big Data usage by prioritizing and minimizing unwanted data download

and reducing data-processing time. The recommendations from the machine learn-

ing clustering algorithms were also published into the RDF format to represent the

extracted knowledge in a completely machine readable manner and to be able to

interprete programmatically [42,52]. Development of this unique system based on

semantic data integration and machine learning-based data recommendation was the

main achievement of this study. This semantically guided machine learning-based

approach provided a great deal of flexibility in terms of data, knowledge, and prov-

enance integration. Big knowledge integration and recommendation architecture,

based on complementary knowledge integration could provide a generic knowledge

platform for any future environmental decision support application system [22,42].

15.3 DATA TO KNOWLEDGE ARCHITECTURE

Design of knowledge integration architecture was motivated by the fact that none

of the existing data model integration architectures were capable of handling, pro-

cessing, and analyzing multiple large environmental data sources simultaneously.

Database on its own does not carry any weight unless data is converted into knowl-

edge. True data integration is completely dependent on contextual integration of data

sources, where the physical attribute-based parametric integration is complemented

with the semantically matched metadata, information about the purpose of the data,

data usability information, and knowledge recommendation based on unsupervised

data analysis. Designing the architecture focus was mainly on the development of

the architectural capability to get the final outcomes integrated and published with

the LOD cloud. The main purpose of this architecture was to integrate knowledge

from different sources, analyze, and recommend knowledge in a way that could have

the highest possible accessibility on the web, so that the next-generation environmen-

tal application designer could access the recommendation, knowledge, and also the

original data sources programmatically. The design of this architecture had seven

different layers, namely Web Adaptors Layer, Data Harmonization Layer, Semantic

Cross Validation Layer, Feature Representation Layer, SOM-Based Knowledge

Recommendation Layer, RDF Conversion Triple Store implementation Layer, and

LOD Publishing Layer, which are described in details in the following sections

(Figure 15.3).

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home