Database Reference
In-Depth Information
The Bigtable API provides functions for creating and deleting tables and column
families. It also provides functions for changing cluster, table, and column family
metadata, such as access control rights. Client applications can write or delete values
in Bigtable, look up values from individual rows, or iterate over a subset of the data
in a table. At the transaction level, Bigtable supports only single-row transactions
which can be used to perform atomic read-modify-write sequences on data stored
under a single row key (i.e. no general transactions across row keys).
At the physical level, Bigtable uses the distributed Google File System
(GFS) [ 137 ] to store log and data files. The Google SSTable file format is used
internally to store Bigtable data. An SSTable provides a persistent, ordered
immutable map from keys to values, where both keys and values are arbitrary
byte strings. Bigtable relies on a distributed lock service called Chubby [ 90 ] which
consists of five active replicas, one of which is elected to be the master and actively
serve requests. The service is live when a majority of the replicas are running and
can communicate with each other. Bigtable uses Chubby for a variety of tasks
such as: (1) ensuring that there is at most one active master at any time. (2) storing
the bootstrap location of Bigtable data. (3) storing Bigtable schema information
and to the access control lists. The main limitation of this design is that if Chubby
becomes unavailable for an extended period of time, the whole Bigtable becomes
unavailable. At the runtime, each Bigtable is allocated to one master server and
many tablet servers which can be dynamically added (or removed) from a cluster
based on the changes in workloads. The master server is responsible for assigning
tablets to tablet servers, balancing tablet-server load, and garbage collection of files
in GFS. In addition, it handles schema changes such as table and column family
creations. Each tablet server manages a set of tablets. The tablet server handles read
and write requests to the tablets that it has loaded, and also splits tablets that have
grown too large.
Yahoo: PNUTS
The PNUTS system (renamed later to Sherpa) is a massive-scale hosted database
system which is designed to support Yahoo!'s web applications [ 111 , 209 ]. The main
focus of the system is on data serving for web applications, rather than complex
queries. It relies on a simple relational model where data is organized into tables
of records with attributes. In addition to typical data types, blob is a main valid
data type which allows arbitrary structures to be stored inside a record, but not
necessarily large binary objects like images or audio. The PNUTS system does not
enforce constraints such as referential integrity on the underlying data. Therefore,
the schema of these tables are flexible where new attributes can be added at any time
without halting any query or update activity. In addition, it is not required that each
record have values for all attributes.
Figure 3.3 illustrates the system architecture of PNUTS. The system is divided
into regions where each region contains a full complement of system components
Search WWH ::




Custom Search