Database Reference
In-Depth Information
data sources, capture features, unify, and integrate knowledge in a meaningful way
that could be used for any future application. The idea was to process the raw data,
capture feature extraction-based knowledge, and also maintain the provenance infor-
mation along with the extracted knowledge. Capturing available provenance infor-
mation as a part of the knowledge integration was important so that knowledge from
the data sources could be traced back to the origin. Figure 15.2 shows the dynamic
workflow of the proposed system. The idea was to capture the knowledge from mul-
tiple environmental Big Data sources so that the large-scale cross-validation and
the complementary knowledge integration could be conducted. This flow diagram
explains the concept and motivation behind this chapter to recommend about the
available Big Data in an autonomous way. This study considered five different large
environmental data sources for large-scale unified complementary knowledge inte-
gration. The knowledge integration architecture was designed, which consisted of
two different processing parts, namely “data accumulation through web integration”
and “integrated knowledge recommendation-based on machine learning.” For any
given geographical location and a given time frame, five different data sources were
acquired automatically using intelligent web data adaptors, which were developed
for this purpose. All data sets were then preprocessed, integrated, and represented
in a unified resource structured manner. Unified knowledge RDFs were created for
all the environmental data sources based on preprocessed data, available metadata,
and original provenance information. The provenance information (such as origin
of data, author, and time along with all the raw data) were also captured within
the integrated RDF knowledge file and stored into the triple store knowledgebase.
The next part of the chapter covers a data and knowledge recommendation system
based on novel mixture of unsupervised machine learning clustering algorithms,
Clustering algorithms-based on principal component analysis (PCA) [43] and guided
self-organizing map (g-SOM) [44] were used to process data dynamically without
FIGURE 15.2
Motivation and work flow of this chapter.
Search WWH ::




Custom Search