A Cloud-Based, Geospatial Linked Data Management System - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

However, before sketching and then realizing the new architecture, it had to

be decided how to deal with the issue of updating as this was creating trac to

the system if all Virtuoso servers involved in the instances had to be updated.

Furthermore, the problem of updating new instances to the most up-to-date RDF

content had to be resolved. To this end, it was decided: (a) to directly update

a Virtuoso server only in one instance, from now on called the master instance ,

via LMS and propagate the changes only to the current instances running, from

now on called slave instances , in the load balancing component (which are of

course less than those used in the previous architecture) and (b) lazy update the

image used to create new slave instances, from now on called slave image ,ina

timely fashion (e.g., every half an hour) and only when updates have previously

occurred after the previous image updating. While the first decision does not

totally remedy the first problem, we followed it by having in mind the fact that

the current (master and slave) instances should be up-to-date with respect to the

RDF content while new slave instances can be allowed to be a little bit out of date

as this does not jeopardize the proper functioning of the applications supported

by the system. Such lazy updating was rather a necessity by considering the fact

that image updating can take minutes and is costly so it cannot be performed

each time a single update is performed in the system.

The above decisions had to be properly backed up by the respective tech-

nologies exploited. On one hand, the free and latest version of Virtuoso does not

allow the updating of many Virtuoso servers that might form a certain cluster in

an automated way. Such an updating is a proprietary feature of all Virtuoso ver-

sions. To this end, we proceeded in developing our own mechanism for updating

the current running Virtuoso servers by exploiting the underlying SQL function-

ality of Virtuoso. In the first place, we created triggers on the master instance

that were used when the main RDF table of the respective Virtuoso server was

updated (i.e., the one named RDF QUAD ) to update the Virtuoso servers in

the (running) slave instances. However, this ended up becoming quite slow as

the update was finished only when all Virtuoso servers were updated. To solve

this, we decided to follow a log-based approach where the triggers write into a

specific file what is updated (in the form of actual SQL statements) and then a

Java program consumes the entries of this log file and is responsible for updating

the remaining Virtuoso servers. This component, which is named as Updater ,is

also responsible for updating the slave image only every half an hour and only

when an update has occurred after the last slave image updating. It exploits the

Amazon Web Services SDK for java ( http://aws.amazon.com/documentation/

sdk-for-java/ ) to find out the IPs of the remaining Virtuoso instances as well

as perform the slave image updating. Through this solution, the LD updating

ends when the Virtuoso instance receiving the update request finishes processing

it; the Virtuoso servers of the slave instances are updated subsequently via the

Updater. As such, there is no actual delay in performing LD updating and we

allow for a small inconsistency until LD updating is propagated to the remaining

Virtuoso servers which, as already stated, is acceptable.

On the other hand, the (basic) load balancer (LB) offered by the currently

exploited cloud (Amazon EC2) does not offer the capability to route an update

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home