CloudDB AutoAdmin - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

records an id and a timestamp in each row. A heartbeat plug-in for Cloudstone is

implemented to insert a new row with a global id and a local timestamp to the master

periodically during the experiment. Once the insert query is replicated to slaves,

every slave re-executes the query by committing the global id and its own local time-

stamp. The replication delay from the master to slaves is then calculated as the dif-

ference of two timestamps between the master and each slave. In practice, there are

two challenges with respect to achieving a fine-grained measurement of replication

delay: the resolution of the time/date function and the clock synchronization between

the master and slaves. The time/date function offered by MySQL has a resolution

of a second that represents an unacceptable solution because accurate measuring of

the replication delay requires a higher precision. We, therefore, implemented a user

defined time/date function with a microsecond resolution that is based on a proposed

solution to MySQL Bug #8523.* The clock synchronizations between the master and

slaves are maintained by NTP † (Network Time Protocol) on Amazon EC2. We set

the NTP protocol to synchronize with multiple time servers every second to have a

better resolution.

With the customized Cloudstone ‡ and the heartbeat plug-in, we are able to

achieve our goal of measuring the end-to-end database throughput and the repli-

cation delay. In particular, we defined two configurations with read/write ratios of

50/50 and 80/20. We also defined three configurations of the geographical locations

based on availability zones (they are distinct locations within a region) and regions

(they are separated into geographic areas or countries) as follows: same zone where

all slaves are deployed in the same Availability Zone of a Region of the master

database; different zones where the slaves are in the same Region as the master

database, but in different availability zones; different regions where all slaves are

geographically distributed in a different region from where the master database is

located. The workload and the number of database replicas start with a small number

and gradually increase at a fixed step. Both numbers stop increasing if there are no

throughputs gained.

11.5.2 e XPeriment s etuP

We conducted our replication experiments in Amazon EC2 service with a three-layer

implementation (Figure 11.3). The first layer is the Cloudstone benchmark that controls

the read/write ratio and the workload by separately adjusting the number of read and

write operations, and the number of concurrent users. As a large number of concur-

rent users emulated by the benchmark could be very resource-consuming, the bench-

mark is deployed in a large instance to avoid any overload on the application tier. The

second layer includes the master database that receives the write operations from the

benchmark and is responsible for propagating the write sets to the slaves. The master

database runs in a small instance so that saturation can be expected to be observed

* http://bugs.mysql.com/bug.php?id=8523.

† http://www.ntp.org/.

‡ The source code of our Cloudstone customized implementation is available on http://code.google.

com/p/clouddbreplication/.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home