Recommending Environmental Big Data Using Semantically Guided Machine Learning - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

As the learning rate decreases in competitive learning, large changes are possible

at the beginning of the process. v ( u s , t ) indicates the neighborhood function of the neu-

ron u s . Later with decreasing radius r of the r ( t )-neighborhood of the winner neuron,

only neurons close to the winner are affected. Weights vectors of a SOM can gradually

develop an approximately regular grid after being subjected to input patterns uniformly

distributed over the unit square. SOMs are suitable for solving free learning problems,

but it can also be advantageous to use it to divide the input domain of a fixed learning

problem, for example, counter propagation networks. First, the input domain is parti-

tioned, and then the mean value of the output given by the learning problem for each

individual set of the partition is determined. Finally, the counter propagation provides

for all inputs classified by a neuron of the competition network, and the mean value

over this set as output. This kind of network can only learn piecewise constant function

correctly; linear associates can be used to extend the number of applications to linear

function with the help of the delta rule. The g-SOM clustering on the integrated selected

preprocessed data was quite useful as this technique provided a 2D visual map repre-

sentation of the whole database and natural grouping of the data attributes.

15.3.6 rDF C onversion anD t riPle s tore i imPlementation l ayer

This layer was constructed based on RDF, uniform resource identifier (URI), and

triple store technologies. The aim of this layer was to present the integrated complex

knowledge and associated dynamic recommendations in a more meaningful, trans-

parent, and highly accessible way. W3C introduced the RDF format, which is now

a standard model for machine readable data presentation [7,18,27,55,62]. It decom-

poses data into the pieces (subject, object, and predicate) and gives a URI for each

resource or object. In computing, a URI is a string of characters used to identify a

name or a resource. Such identification enables interaction with representations of

the resource over a network (typically the WWW) using specific protocols. Schemes

specifying a concrete syntax and associated protocols define each URI. Through

the URIs, it is possible to read the information about the particular resource on the

web using the HTTP access. A unified knowledge integration and representation

model was developed using RDF format. Unified knowledge RDFs were created

for all the data sources based on preprocessed data, extracted semantic features,

available metadata, and original provenance information. This made the integrated

environmental feature-based knowledge ready for flexible web integration. The RDF

format provided semantic features sets a unique capability to facilitate data integra-

tion even if the underlying schema differed and it specially supported the evaluation

of schemas over time without requiring the entire data consumption to be changed.

A triple store is a framework used for storing and querying RDF data. It provides a

mechanism for persistent storage and access of RDF graphs. Recently, there has been

a major development initiative in query processing, access protocols and triple store

technologies. The knowledge integration framework was developed using a triple

called “Sesame triple store.” Sesame (Figure 15.4) is an open-source framework for

storage inference and querying of RDF data. Sesame matches the features of Jena

with the availability of a connection API, inference support for multiple back ends

like MySQL and Postgres [7,18,30,32,55,62].

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home