Cloud-Hosted Data Storage Systems - Cloud Data Management

Database Reference

In-Depth Information

The Bigtable API provides functions for creating and deleting tables and column

families. It also provides functions for changing cluster, table, and column family

metadata, such as access control rights. Client applications can write or delete values

in Bigtable, look up values from individual rows, or iterate over a subset of the data

in a table. At the transaction level, Bigtable supports only single-row transactions

which can be used to perform atomic read-modify-write sequences on data stored

under a single row key (i.e. no general transactions across row keys).

At the physical level, Bigtable uses the distributed Google File System

(GFS) [ 137 ] to store log and data files. The Google SSTable file format is used

internally to store Bigtable data. An SSTable provides a persistent, ordered

immutable map from keys to values, where both keys and values are arbitrary

byte strings. Bigtable relies on a distributed lock service called Chubby [ 90 ] which

consists of five active replicas, one of which is elected to be the master and actively

serve requests. The service is live when a majority of the replicas are running and

can communicate with each other. Bigtable uses Chubby for a variety of tasks

such as: (1) ensuring that there is at most one active master at any time. (2) storing

the bootstrap location of Bigtable data. (3) storing Bigtable schema information

and to the access control lists. The main limitation of this design is that if Chubby

becomes unavailable for an extended period of time, the whole Bigtable becomes

unavailable. At the runtime, each Bigtable is allocated to one master server and

many tablet servers which can be dynamically added (or removed) from a cluster

based on the changes in workloads. The master server is responsible for assigning

tablets to tablet servers, balancing tablet-server load, and garbage collection of files

in GFS. In addition, it handles schema changes such as table and column family

creations. Each tablet server manages a set of tablets. The tablet server handles read

and write requests to the tablets that it has loaded, and also splits tablets that have

grown too large.

Yahoo: PNUTS

The PNUTS system (renamed later to Sherpa) is a massive-scale hosted database

system which is designed to support Yahoo!'s web applications [ 111 , 209 ]. The main

focus of the system is on data serving for web applications, rather than complex

queries. It relies on a simple relational model where data is organized into tables

of records with attributes. In addition to typical data types, blob is a main valid

data type which allows arbitrary structures to be stored inside a record, but not

necessarily large binary objects like images or audio. The PNUTS system does not

enforce constraints such as referential integrity on the underlying data. Therefore,

the schema of these tables are flexible where new attributes can be added at any time

without halting any query or update activity. In addition, it is not required that each

record have values for all attributes.

Figure 3.3 illustrates the system architecture of PNUTS. The system is divided

into regions where each region contains a full complement of system components

Search WWH ::

Custom Search

Home