An Overview of the NoSQL World - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

This chapter explores the recent advancements and the new approaches of the web-

scale data management. We discuss the advantages and the disadvantages of each

approach and its suitability to support certain class of applications and end users. Section

9.2 describes the NoSQL systems that are introduced and used internally in the key

players: Google, Yahoo, and Amazon, respectively. Section 9.3 provides an overview of

a set of open-source projects, which have been designed following the main principles

of the NoSQL systems. Section 9.4 discusses the notion of providing database manage-

ment as a service and gives an overview of the main representative systems and their

challenges. The web-scale data management tradeoffs and open research challenges are

discussed in Section 9.5 before we conclude the chapter in Section 9.7.

9.2 NoSQL KEY SYSTEMS

This section provides an overview of the main NoSQL systems which has been intro-

duced and used internally by three of the key players in the web-scale data manage-

ment domain: Google, Yahoo, and Amazon.

9.2.1 g oogle : b igtable

Bigtable is a distributed storage system for managing structured data that is designed

to scale to a very large size (petabytes of data) across thousands of commodity serv-

ers [21]. It has been used by more than 60 Google products and projects such as

Google search engine,* Google Finance, † Orkut, ‡ Google Docs, § and Google Earth. ¶

These products use Bigtable for a variety of demanding workloads, which range

from throughput-oriented batch-processing jobs to latency-sensitive serving of data

to end users. The Bigtable clusters used by these products span a wide range of con-

figurations, from a handful to thousands of servers, and store up to several hundred

terabytes of data.

Bigtable does not support a full relational data model. However, it provides clients

with a simple data model that supports dynamic control over data layout and format. In

particular, a Bigtable is a sparse, distributed, persistent multidimensional sorted map.

The map is indexed by a row key, column key, and a timestamp. Each value in the

map is an uninterpreted array of bytes. Thus, clients usually need to serialize various

forms of structured and semistructured data into these strings. A concrete example that

reflects some of the main design decisions of Bigtable is the scenario of storing a copy

of a large collection of web pages into a single table. Figure 9.2 illustrates an example

of this table where URLs are used as row keys and various aspects of web pages as

column names. The contents of the web pages are stored in a single column that stores

multiple versions of the page under the timestamps when they were fetched.

The row keys in a table are arbitrary strings where every read or write of data

under a single row key is atomic. Bigtable maintains the data in lexicographic order

* http://www.google.com/.

† http://www.google.com/finance.

‡ http://www.orkut.com/.

§ http://docs.google.com/.

¶ http://earth.google.com/.

Search WWH ::

Custom Search

Home