Information Technology Reference
In-Depth Information
then it moves to Group B and a new empty disk is supplied to Group A from
the disk pool.
In Group B, to skew the workload, each disk periodically (e.g., once every
hour) exchanges part of its stored data one-for-one following four steps.
Step 1: The disk (say, D 1 ) with the smallest ID randomly chooses a disk (say,
D 2 ).
Step 2: D 1 and D 2 exchange information of their current workloads.
Step 3: If D 1 is more popular than D 2 , the least frequently accessed file (say,
f L )on D 1 is exchanged for the most frequently accessed file (say, f H )on
D 2 , and this process is repeated while the frequency of access of f L is less
than that of f H .
Step 4: D 1 issues an instruction to the next disk.
When implementing this process, the access frequency of popular disks gradually
increases, until the capacity of the disks is exceeded, by gathering popular data
from the neighboring disks, and the opposite process is carried out for unpopular
disks so as to extend their time spent in low-power mode. During this exchange
process, if a disk eventually becomes empty, it is removed from Group B and
queued in the Empty disk pool for the supply of disks to Group A.
Here, we may consider restrictions on the exchange to reduce migration cost,
such as limits on the number of exchangeable files during any one exchange, the
number of migrations for each file, and the scope of exchangeable disks. In this
study, we evaluated the effects of restrictions on the energy performance of disks,
which are described in later sections.
The underlying lookup service used by clients to access data is managed by a
distributed hash table, such as Chord [7]. However, for adaption to our system
in which data are frequently moved among disks, we use pointers in a list of
key-value pairs to indicate the current disk on which data are stored. That is, if
afile f 1 stored on disk D 1 migrates to D 2 ,thekeyof f 1 is associated with the
destination of migration (i.e., D 2 )in D 1 's list of key-value pairs. In addition,
to avoid multiple hops from pointer to pointer, each file is assumed to have
information of the ID of the disk originally storing the file as metadata, and if
f 1 originally stored on D 1 and moved to D 2 is moved further to D 3 , the pointer
of D 1 is rewritten so as to directly indicate the current position.
4 Data Access Tracing in Flickr
To evaluate the effectiveness of our method in a realistic situation, as a prelim-
inary study, we traced access patterns of photo data uploaded to Flickr, which
is one of the largest photo sharing services in the world.
In this preliminary study, we randomly selected 20,000 photos and traced the
cumulative number of accesses for each file every hour over two weeks with APIs
provided by the website. Owing to limitations of accessible observable data, all
the selected photos are public, although the website has supposedly around four
 
Search WWH ::




Custom Search