S-CLONE - Data Storage for Social Networks: A Socially Aware Approach

Databases Reference

In-Depth Information

interesting pattern. The improvement in S-CLONE is more significant in early

increases of the number of replicas but not so significant after a sufficiently large

number. For example, in the case M D 32 when S-CLONE is applied on top of the

random partition (Fig. 4.3 a), the read cost of S-CLONE drops quickly from 24 to 5

as K increases from 1 to 17, but afterwards the decrease is less significant. The drop

is quicker if S-CLONE is applied on top of the METIS partition. This implies that

although both the random partition and METIS partition offer comparable degrees

of load balancing, to achieve the same improvement rate for the total read load,

we need fewer replicas per user if METIS partitioning is used than if random

partitioning is used. The reason, we conjecture, is because METIS does preserve

social locality whereas random partitioning does not. Consequently, social locality

should be considered highly in the storage design.

The superiority of S-CLONE to random replication is obvious, especially when

more servers are deployed or when METIS is used for partitioning instead of random

partitioning. For example, on top of random partitioning when M D 32,inorder

to achieve a read cost of 5, S-CLONE requires 17 replicas per user (i.e., K D 11)

but random replication requires 26 replicas. On top of METIS partitioning when

M D 32, S-CLONE requires just 3 replicas per user but random replication requires

19 replicas. It is thus important that we take social locality into account not only

when we store the primary data, but also when we replicate it.

We also observe that, for each given M , there is a value for K that maximizes

the efficiency gap between S-CLONE and random replication. For example, in the

case M D 32 (Fig. 4.3 a), this value is K D 15. The gap is narrower as K is

approaching towards 1 or towards M 1. This is understandable because in these

two extreme cases there is no substantial difference in the replica placement using

either partitioning scheme. It will be interesting though to derive a formula for the

optimal value of K that will maximize the efficiency gap.

In terms of load balancing, Fig. 4.4 plots the Gini coefficient of S-CLONE for

cases M D 8, M D 16,andM D 32 when it is applied on top of the random

partition and the METIS partition. It is observed that S-CLONE balances the load

better when more servers are deployed or when more replicas are allowed per user.

The Gini coefficient is at most 0.35 when eight servers are deployed and at most

0.17 when 32 servers are deployed. These values are acceptable given the fact that

S-CLONE starts with an existing partition and the results are obtained for the basic

version of S-CLONE with load balancing being the secondary objective, not the

primary. We expect better Gini coefficient for the enhanced version of S-CLONE

which enforces a stricter constraint on load balancing

4.4

Notes

For OSNs that already employ an arbitrary data partition structure, whose data

need to be replicated, we can increase the extent of social locality during the

replication procedure. S-CLONE is a socially aware replication scheme which,

Search WWH ::

Custom Search

Home