An Overview of the NoSQL World - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

the corresponding server. The in-memory data storage is implemented using a dis-

tributed memory object caching system called Memcache , * while the on-disk data

storage is implemented as a HDFS file residing in the Hadoop data node server.

The HyperTable † project is designed to achieve a high performance, scalable, dis-

tributed storage, and processing system for structured and unstructured data. It is

designed to manage the storage and processing of information on a large cluster of

commodity servers, providing resilience to machine and component failures. Like

HBase, Hypertable also runs over HDFS to leverage the automatic data replication,

and fault tolerance that it provides. In HyperTable, data is represented in the system

as a multidimensional table of information. The HyperTable systems provides a low-

level API and Hypertable Query Language (HQL) that provides the ability to create,

modify, and query the underlying tables. The data in a table can be transformed and

organized at high speed by performing computations in parallel, pushing them to

where the data is physically stored.

CouchDB ‡ is a document-oriented database that is written in Erlang and can be

queried and indexed in a MapReduce fashion using JavaScript. In CouchDB, docu-

ments are the primary unit of data. A CouchDB document is an object that consists

of named fields. Field values may be strings, numbers, dates, or even ordered lists

and associative maps. Hence, a CouchDB database is a flat collection of documents

where each document is identified by a unique ID. CouchDB provides a RESTful

HTTP API for reading and updating (add, edit, delete) database documents. The

CouchDB document update model is lockless and optimistic. Document edits are

made by client applications. If another client was editing the same document at the

same time, the client gets an edit conflict error on save. To resolve the update con-

flict, the latest document version can be opened, the edits reapplied, and the update

retried again. Document updates are all or nothing, either succeeding entirely or

failing completely. The database never contains partially saved or edited documents.

MongoDB § is another example of distributed schema-free document-oriented

database, which is created at 10gen. ¶ It is implemented in C++ but provides drivers

for a number of programming languages including C, C++, Erlang. Haskell, Java,

JavaScript, Perl, PHP, Python, Ruby, and Scala. It also provides a JavaScript

command-line interface. MongoDB stores documents as BSON (Binary JSON),

which are binary encoded JSON like objects. BSON supports nested object struc-

tures with embedded objects and arrays. At the heart of MongoDB is the concept

of a document that is represented as an ordered set of keys with associated values.

A collection is a group of documents. If a document is the MongoDB analog of a

row in a relational database, then a collection can be thought of as the analog to a

table. Collections are schema-free. This means that the documents within a single

collection can have any number of different shapes. MongoDB groups collections

into databases . A single instance of MongoDB can host several databases, each of

which can be thought of as completely independent. It provides eventual consistency

* http://memcached.org/.

† http://hypertable.org/.

‡ http://couchdb.apache.org/.

§ http://www.mongodb.org/.

¶ http://www.10gen.com/.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home