Recommending Environmental Big Data Using Semantically Guided Machine Learning - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

data sources, capture features, unify, and integrate knowledge in a meaningful way

that could be used for any future application. The idea was to process the raw data,

capture feature extraction-based knowledge, and also maintain the provenance infor-

mation along with the extracted knowledge. Capturing available provenance infor-

mation as a part of the knowledge integration was important so that knowledge from

the data sources could be traced back to the origin. Figure 15.2 shows the dynamic

workflow of the proposed system. The idea was to capture the knowledge from mul-

tiple environmental Big Data sources so that the large-scale cross-validation and

the complementary knowledge integration could be conducted. This flow diagram

explains the concept and motivation behind this chapter to recommend about the

available Big Data in an autonomous way. This study considered five different large

environmental data sources for large-scale unified complementary knowledge inte-

gration. The knowledge integration architecture was designed, which consisted of

two different processing parts, namely “data accumulation through web integration”

and “integrated knowledge recommendation-based on machine learning.” For any

given geographical location and a given time frame, five different data sources were

acquired automatically using intelligent web data adaptors, which were developed

for this purpose. All data sets were then preprocessed, integrated, and represented

in a unified resource structured manner. Unified knowledge RDFs were created for

all the environmental data sources based on preprocessed data, available metadata,

and original provenance information. The provenance information (such as origin

of data, author, and time along with all the raw data) were also captured within

the integrated RDF knowledge file and stored into the triple store knowledgebase.

The next part of the chapter covers a data and knowledge recommendation system

based on novel mixture of unsupervised machine learning clustering algorithms,

Clustering algorithms-based on principal component analysis (PCA) [43] and guided

self-organizing map (g-SOM) [44] were used to process data dynamically without

FIGURE 15.2

Motivation and work flow of this chapter.

Search WWH ::

Custom Search

Home