Indexing and Ordering Data Sets - Professional NoSQL - page 166

Databases Reference

In-Depth Information

are automatic and triggered for all changed data sets when they are read fi rst after the change.

To understand this further, step back a bit to review the essential mechanics of data access in

CouchDB. CouchDB follows the MapReduce style data manipulation.

The map function that emits key/value pairs based on the collection data leads to view results.

When such views are accessed for the fi rst time a B-tree index is built out of this data. On

subsequent querying the data is returned from the B-tree and the underlying data is untouched.

This means queries beyond the very fi rst one leverage the B-tree index.

The B-tree Index in CouchDB

A B-tree index scales well for large amounts of data. Despite huge data growth, the height

of a B-tree remains in single digits and allows for fast data retrieval. In CouchDB, the B-tree

implementation has specialized features like MultiVersion Concurrency Control and append-only

design.

MultiVersion Concurrency Control (MVCC) implies that multiple reads and writes can occur in

parallel without the need for exclusive locking. The simplest parallel of this is distributed software

version control like GitHub. All writes are sequenced and reads are not impacted by writes.

CouchDB has a _rev property that holds the most current revision value. Like optimistic locking,

writes and reads are coordinated based on the _rev value.

Therefore, each version is the latest one at the time a client starts reading the data. As documents

are modifi ed or deleted the index in the view results are updated.

The couchdb-lucene project ( https://github.com/rnewson/couchdb-lucene)

provides full text search capability using Lucene, the open-source search engine,

and CouchDB.

INDEXING IN APACHE CASSANDRA

Column-oriented databases like HBase and Hypertable have a default row-key-based order and

index. Indexes on column values, which are often called secondary indexes, are typically not

available out-of-box in these databases. HBase has some minimal support for secondary indexes.

Hypertable intends to support secondary index by the time of its version 1.0 release, which will be

available later this year.

Apache Cassandra is a hybrid between a column-oriented database and a pure key/value data store.

It incorporates ideas from Google Bigtable and Amazon Dynamo. Like column-oriented databases,

Cassandra supports row-key-based order and index by default. In addition, Cassandra also supports

secondary indexes.

Secondary indexes support in Cassandra is explained using a simple example. You may recall a

Cassandra database example with CarDataStore keyspace and the Cars column-family from

Chapter 2. The same example is revisited for explaining support for secondary indexes.

Next Page

Professional NoSQL

Search WWH ::

Custom Search

Home