Databases Reference
In-Depth Information
are automatic and triggered for all changed data sets when they are read fi rst after the change.
To understand this further, step back a bit to review the essential mechanics of data access in
CouchDB. CouchDB follows the MapReduce style data manipulation.
The map function that emits key/value pairs based on the collection data leads to view results.
When such views are accessed for the fi rst time a B-tree index is built out of this data. On
subsequent querying the data is returned from the B-tree and the underlying data is untouched.
This means queries beyond the very fi rst one leverage the B-tree index.
The B-tree Index in CouchDB
A B-tree index scales well for large amounts of data. Despite huge data growth, the height
of a B-tree remains in single digits and allows for fast data retrieval. In CouchDB, the B-tree
implementation has specialized features like MultiVersion Concurrency Control and append-only
design.
MultiVersion Concurrency Control (MVCC) implies that multiple reads and writes can occur in
parallel without the need for exclusive locking. The simplest parallel of this is distributed software
version control like GitHub. All writes are sequenced and reads are not impacted by writes.
CouchDB has a _rev property that holds the most current revision value. Like optimistic locking,
writes and reads are coordinated based on the _rev value.
Therefore, each version is the latest one at the time a client starts reading the data. As documents
are modifi ed or deleted the index in the view results are updated.
The couchdb-lucene project ( https://github.com/rnewson/couchdb-lucene)
provides full text search capability using Lucene, the open-source search engine,
and CouchDB.
INDEXING IN APACHE CASSANDRA
Column-oriented databases like HBase and Hypertable have a default row-key-based order and
index. Indexes on column values, which are often called secondary indexes, are typically not
available out-of-box in these databases. HBase has some minimal support for secondary indexes.
Hypertable intends to support secondary index by the time of its version 1.0 release, which will be
available later this year.
Apache Cassandra is a hybrid between a column-oriented database and a pure key/value data store.
It incorporates ideas from Google Bigtable and Amazon Dynamo. Like column-oriented databases,
Cassandra supports row-key-based order and index by default. In addition, Cassandra also supports
secondary indexes.
Secondary indexes support in Cassandra is explained using a simple example. You may recall a
Cassandra database example with CarDataStore keyspace and the Cars column-family from
Chapter 2. The same example is revisited for explaining support for secondary indexes.
Search WWH ::




Custom Search