An Overview of the NoSQL World - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

in addition to storing it locally. Node D will store the keys that fall in the ranges ( A ,

B ), ( B , C ), and ( C , D ). The list of nodes that is responsible for storing a particular key

is called the preference list. The system is designed so that every node in the system

can determine which nodes should be in this list for any particular key.

9.3 NoSQL OPEN SOURCE PROJECTS

In practice, most NoSQL data management systems that are introduced by the key

players (e.g., Bigtable, Dynamo, PNUTS) are meant for their internal use only and

are thus, not available for public users. Therefore, many open-source projects have

been built to implement the concepts of these systems and make it available for

public users [18,54]. Due to the ease in which they can be downloaded and installed,

these systems have attracted a lot of interest from the research community. There are

not many details that have been published about the implementation of most of these

systems. In general, the NoSQL open-source projects can be broadly classified into

the following categories:

•

Key-value stores : These systems use the simplest data model, which is a

collection of objects where each object has a unique key and a set of attri-

bute/value pairs.

•

Document stores : These systems have the data models that consists of

objects with a variable number of attributes with a possibility of having

nested objects.

•

Extensible record stores : They provide variable-width tables (Column

Families) that can be partitioned vertically and horizontally across multiple

nodes.

Here, we give a brief introduction about some of these projects. For the full list,

we refer the reader to the NoSQL database website.*

Cassandra † is presented as a highly scalable, eventually consistent, distributed,

structured key-value store [44,45]. It was open-sourced by Facebook in 2008. It

is designed by Avinash Lakshman (one of the authors of Amazon's Dynamo) and

Prashant Malik (Facebook engineer). Cassandra brings together the distributed

systems technologies from Dynamo and the data model from Google's Bigtable.

Like Dynamo, Cassandra is eventually consistent. Like Bigtable, Cassandra pro-

vides a column family-based data model richer than typical key/value systems.

In Cassandra's data model, column is the lowest/smallest increment of data. It is

a tuple (triplet) that contains a name, a value, and a timestamp. A column family

is a container for columns, analogous to the table in a relational system. It con-

tains multiple columns, each of which has a name, value, and a timestamp, and

are referenced by row keys. A keyspace is the first dimension of the Cassandra

hash, and is the container for column families. Keyspaces are of roughly the same

granularity as a schema or database (i.e., a logical collection of tables) in RDBMS.

* http://NoSQL-database.org/.

† http://cassandra.apache.org/.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home