S-CLONE - Data Storage for Social Networks: A Socially Aware Approach

Databases Reference

In-Depth Information

4.2

Algorithm

S-CLONE consists of two phases: replicate and adjust .

1. Replicate phase: The procedure starts in this phase. It works in a greedy manner,

sequentially considering a node at a time and finding the best way to replicate its

data. Suppose that the nodes are processed in the order f 1, 2, ..., N g . For each

node i (initially node 1) that has not been processed (i.e., all the nodes 1, 2, ...,

i 1 have been processed), the K replicas are assigned to their corresponding

servers as follows. First, the location histogram below is computed for i

X

N

h i .s/ D

r j p js e ji

(4.1)

j D 1

where h i .s/ is the number of i 's neighbor nodes who are primarily stored at

server s. Then, the top K servers (with highest h i values) are chosen to each

store a replica for user i . If there is a tie, the server with the least write load (so

far) is preferred. A special case occurs where there are not enough K servers in

the histogram; in this case, the remaining replicas for node i will be placed in the

Adjust phase below.

2. Adjust phase: The goal of this phase is to find the servers for the remaining

replicas that cannot be placed due to the special case aforementioned above.

Because the read cost is the same no matter where to store these replicas, their

locations are chosen to maximize load balancing. The best way to do this is to

process each remaining replica one by one and place it on the server that currently

has the least load.

After the K replicas of i have been placed, the algorithm moves on to process the

next user, node .i C 1/. Note that in choosing the K servers to replicate the data

for a user, we do not consider its primary server because it makes no sense to put

a replica and its primary on the same server. Also, as only the location information

of the primary copies is used to determine where to replicate a user, the order to

process the users does not have impact on the final replica placement. This is a

desirable property because there may be different structures to represent a social

graph and S-CLONE can work with any structure unbiasedly.

The algorithm above is the basic version of S-CLONE, which treats load

balancing as the secondary objective. For systems that require a better degree of

load balancing, we can extend S-CLONE by enforcing the following constraint:

X

N

X

N

write

s

write

8 s W l

D

w i .p is C x is / .1 C "/.L

=M / D .1 C "/.K C 1/

w i =M

i D 1

i.e., the write load of each server cannot exceed the average write load by a factor

of ", a small value representing the desired load balancing. Here, the formulas for

Data Storage for Social Networks: A Socially Aware Approach

Search WWH ::

Custom Search

Home