Database Reference
In-Depth Information
This chapter explores the recent advancements and the new approaches of the web-
scale data management. We discuss the advantages and the disadvantages of each
approach and its suitability to support certain class of applications and end users. Section
9.2 describes the NoSQL systems that are introduced and used internally in the key
players: Google, Yahoo, and Amazon, respectively. Section 9.3 provides an overview of
a set of open-source projects, which have been designed following the main principles
of the NoSQL systems. Section 9.4 discusses the notion of providing database manage-
ment as a service and gives an overview of the main representative systems and their
challenges. The web-scale data management tradeoffs and open research challenges are
discussed in Section 9.5 before we conclude the chapter in Section 9.7.
9.2 NoSQL KEY SYSTEMS
This section provides an overview of the main NoSQL systems which has been intro-
duced and used internally by three of the key players in the web-scale data manage-
ment domain: Google, Yahoo, and Amazon.
9.2.1 g oogle : b igtable
Bigtable is a distributed storage system for managing structured data that is designed
to scale to a very large size (petabytes of data) across thousands of commodity serv-
ers [21]. It has been used by more than 60 Google products and projects such as
Google search engine,* Google Finance, Orkut, Google Docs, § and Google Earth.
These products use Bigtable for a variety of demanding workloads, which range
from throughput-oriented batch-processing jobs to latency-sensitive serving of data
to end users. The Bigtable clusters used by these products span a wide range of con-
figurations, from a handful to thousands of servers, and store up to several hundred
terabytes of data.
Bigtable does not support a full relational data model. However, it provides clients
with a simple data model that supports dynamic control over data layout and format. In
particular, a Bigtable is a sparse, distributed, persistent multidimensional sorted map.
The map is indexed by a row key, column key, and a timestamp. Each value in the
map is an uninterpreted array of bytes. Thus, clients usually need to serialize various
forms of structured and semistructured data into these strings. A concrete example that
reflects some of the main design decisions of Bigtable is the scenario of storing a copy
of a large collection of web pages into a single table. Figure 9.2 illustrates an example
of this table where URLs are used as row keys and various aspects of web pages as
column names. The contents of the web pages are stored in a single column that stores
multiple versions of the page under the timestamps when they were fetched.
The row keys in a table are arbitrary strings where every read or write of data
under a single row key is atomic. Bigtable maintains the data in lexicographic order
* http://www.google.com/.
http://www.google.com/finance.
http://www.orkut.com/.
§ http://docs.google.com/.
http://earth.google.com/.
Search WWH ::




Custom Search