A Cloud-Based, Geospatial Linked Data Management System - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

his/her original data and have it published in the system (e.g., for privacy reasons

does not desire to publish all of his/her data); (b) the provider has relational data

and desires to import them into the system. An R2RML specification is then

defined and fed to the system which initiates the unidirectional relational-to-

LD mapping process that also caters for keeping the two different forms of data

synchronized. This automated publishing way is appropriate when the provider's

data are in relational form and the provider has the capability to define an

R2RML specification. The data provider can control which data are transformed

and imported via the R2RML specification; (c) the provider has XML-based data

and needs to transform and store them in the system. This publishing way is

similar to the previous one with the following exceptions: not relational but XML

data are concerned, the XML data need to be provided inline in the respective

method request and the synchronization is not fully automated as new data need

to indirectly imported into the system by calling again the addXSLMappings

method.

5 Linked Data Management Architecture

5.1 Previous Architecture Drawbacks and Current Solutions

While the previous architecture is able to address well the need of storing a huge

amount of data as well as of performing load balancing in order to guarantee

a certain level of LD query/export performance, it suffers from the following

drawbacks: (a) it is quite costly as it includes many load balancing components

and even more image instances, (b) the query performance was not adequate

in the case of queries not targeting a particular RDF graph, as query results

from all scaling layers had to be collected and joined before being returned to

the user and (c) updating was performed across all instances of a particular

scaling layer, thus creating increased trac in the system as well as increasing

the update execution time (which could also deteriorate query performance in

cases or domains where the update frequency is higher).

To resolve the above drawbacks, it was decided to rely on a more simplified

architecture which is less costly and draws additional resources only when really

needed. On the other hand, such an architecture provides the necessary sophisti-

cation to adequately handle the challenges of distributed operations. This deci-

sion also relied on the current and forthcoming patterns of system usage where it

is expected that the majority of user requests will require querying and exporting

functionality and not updating one. In fact, in all of the applications currently

supported by the system, the updating was performed sparsely and only in some

cases a little bit more frequent in terms of a few times per day (e.g., consider

that a set of earthquakes occurring at the same day in a certain country cannot

lead to a frequent and enormous updating of the data stored in the system).

By also considering that through testing it was observed that the performance

of Virtuoso was stable even for an increasing and huge amount of LD stored, it

was then decided that there was no need for partitioning the data into different

RDF Stores scattered in different instances.

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home