CloudDB AutoAdmin - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

that allows users to perform individual operations (e.g., browsing, searching, creat-

ing events) as well as social operations (e.g., joining, tagging events) [19]. Unlike

Web 1.0 applications, the more interactive nature of Web 2.0 applications places

many different demands on the database tier of software applications. One of the dif-

ferences is on the write pattern, as contents of Web 2.0 applications depend on user

contributions via blogs, photos, videos, and tags. Therefore, more write transactions

are expected to be processed. Another difference is on the tolerance with regards

to data consistency. In general, Web 2.0 applications are more acceptable to data

staleness. For example, it might not be a mission-critical goal for a social network

application (e.g., Facebook) to immediately have a user's new status available to his

friends. However, a consistency window of some seconds (or even some minutes)

would still be acceptable. Therefore, we believe that the design and workload char-

acteristics of the Cloudstone benchmark is more suitable for the purpose of our study

rather than other benchmarks such as TPC-W* or RUBiS, † which are more represen-

tative of Web 1.0-like applications.

The original software stack of Cloudstone consists of 3 components: web appli-

cation, database, and load generator. Throughout the benchmark, the load genera-

tor generates the load against the web application, which in turn makes use of the

database. The benchmark designs well for benchmarking performance of each tier

for Web 2.0 applications. However, the original design of the benchmark makes it

hard to push the database performance to its performance limits, which limits its

suitability for our experiments of focusing mainly on the database tier of the soft-

ware stack. In general, a user's operation, which is sent by a load generator has to be

interpreted as database transactions in the web tier based on a predefined business

logic before passing the request to the database tier. Thus, the saturation on the web

tier usually happens earlier than the saturation on the database tier. Therefore, we

modified the design of the original software stack by removing the web server tier.

In particular, we reimplemented the business logic of the application in a way that

a user's operation can be processed directly at the database tier without any inter-

mediate interpretation at the web server tier. Meanwhile, on top of our Cloudstone

implementation, we also implemented a connection pool (i.e., DBCP ‡ ) and a proxy

(i.e., MySQL Connector/J § ) components.

The pool component enables the application users to reuse the connections that

have been released by other users who have completed their operations to save the

overhead of creating a new connection for each operation. The proxy component

works as a load balancer among the available database replicas where all write oper-

ations are sent to the master while all read operations are distributed among slaves.

The database tier is composed of multiple MySQL replicas. For the purpose of

monitoring replication delay in MySQL, we have created a Heartbeats database and

a time/date function for each replica. The Heartbeats database, synchronized in

the format of SQL statement across replicas, maintains a “ heartbeat ” table, which

* http://www.tpc.org/tpcw/.

† http://rubis.ow2.org/.

‡ http://commons.apache.org/dbcp/.

§ http://www.mysql.com/products/connector/.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home