A Cloud-Based, Geospatial Linked Data Management System - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

system in order to reach good levels of query performance. Otherwise, we also

risk that the system might go down as it cannot anticipate reaching its limits

for a long time.

Both experiment results show that query time cannot be considered as a scal-

ing factor, especially as the respective behaviour exhibited by Virtuoso remains

at an acceptable, almost stable level. On the other hand, CPU can be considered

as a scaling factor which immediately indicates that the concerned VM has a

hard time in servicing the requests from the concurrent users. In fact, the VM's

CPU usage reaches quite high values which can be considered dangerous for the

health of the VM if they remain for a quite long time. Thus, it is necessary for

the system to scale and obtain more resources in order to even the incoming

load across all the resources reserved. The results show that a CPU threshold of

70 % can be safely considered as the one that can determine when to scale.

Someone can argue that such a limit is quite low with respect to the peak

values exhibited in the two experiments. However, we set this limit at a much

lower value in order to cater for cases where the splitting of query work does not

lead to a sharp decrease in CPU time which indicates the necessity of further

increasing the resources to be utilized. This has been checked through other

experiments with the rest of the queries which show that this threshold really

discriminates when Virtuoso has a hard time in servicing the user requests. These

other experiments, by assessing the performance of queries whose complexity lies

in between those of the two queries considered, have shown that indeed a similar

behaviour is observed which lies in between the one exhibited by the addressing

of the two queries considered. To this end, we have considered not showing

these experiment results in this article. Based on the above analysis, the CPU

threshold determination method can be considered as rather complete by taking

an exhaustive approach to guarantee that the choice made has been correct.

The main question would then be for how long to wait until to scale by

considering that the average CPU value constantly remains above the threshold

obtained. The experiments show that the checking period should be as mini-

mum as possible in order to ooad the current number of VMs for the current

load that is anticipated by them. To this end, it was decided that the checking

period should be 2 min so that we are confident that a temporal spike in load

is not experienced but a high load that is more or less constant. This period

length is appropriate to cater for both experiment cases where the higher need

of more instant reactiveness for the first experiment case is also covered (see

smaller response times for this case with respect to the second). By also consid-

ering that it takes time (some minutes) to create a new instance, the considered

period length seems appropriate. In case of a higher value we run into the dan-

ger of reserving more resources when it is already too late with respect to the

load incurred for the current instance. Again, this choice is guaranteed through

following an exhaustive approach both at real time as well as in extreme syn-

thetic cases for all types of queries issued by the respective applications. Thus,

it seems as the most appropriate solution for the current situation as well as

for forthcoming ones, once our system is exposed to an additional number of

end-user applications.

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home