Database Reference
In-Depth Information
9.2.2 y ahoo : Pnuts
The PNUTS system (renamed later to Sherpa) is a massive-scale hosted database
system that is designed to support Yahoo!s web applications [25,56]. The main focus
of the system is on data serving for web applications, rather than complex queries. It
relies on a simple relational model where data is organized into tables of records with
attributes. In addition to typical data types, blob is a main valid data type, which
allows arbitrary structures to be stored inside a record, but not necessarily large
binary objects like images or audio. The PNUTS system does not enforce constraints
such as referential integrity on the underlying data. Therefore, the schema of these
tables are flexible where new attributes can be added at any time without halting any
query or update activity. In addition, it is not required that each record have values
for all attributes.
Figure 9.3 illustrates the system architecture of PNUTS. The system is divided
into regions where each region contains a full complement of system components
and a complete copy of each table. Regions are typically, but not necessarily, geo-
graphically distributed. Therefore, at the physical level, data tables are horizontally
partitioned into groups of records called tablets . These tablets are scattered across
many servers where each server might have hundreds or thousands of tablets. The
assignment of tablets to servers is flexible in a way that allows balancing the work-
loads by moving a few tablets from an overloaded server to an underloaded server.
The query language of PNUTS supports selection and projection from a single
table. Operations for updating or deleting existing records must specify the primary
key. The system is designed primarily for online serving workloads that consist
mostly of queries that read and write single records or small groups of records. Thus,
it provides a multiget operation that supports retrieving multiple records in parallel
by specifying a set of primary keys and an optional predicate. The router component
(Figure 9.3) is responsible of determining which storage unit needs to be accessed for
a given record to be read or written by the client. Therefore, the primary-key space
of a table is divided into intervals where each interval corresponds to one tablet.
The router stores an interval mapping that defines the boundaries of each tablet and
maps each tablet to a storage unit. The query model of PNUTS does not support join
operations that are too expensive in such massive scale systems.
Region 1
Region 2
Message
broker
Routers
Routers
Tablet
controller
Tablet
controller
Storage units
Storage units
FIGURE 9.3
PNUTS system architecture. (From B. F. Cooper et al., PVLDB , 1, 1277-1288,
20 08.)
Search WWH ::




Custom Search