Introduction to Cassandra - Practical Cassandra

Database Reference

In-Depth Information

ilar to relational tables) do not need to have matching columns within a row. Even

rows within a ColumnFamily are not required to always follow the same naming

schema. The options are available, but data patterns are not strictly enforced. Data

can also be added in very high volumes at very high velocities, and Cassandra

will determine the correct version of a piece of data by resolving the timestamp at

which it was inserted into the system.

Architecturally, its decentralized nature allows for no single point of failure and

ensures that every node in the cluster has the same role. This means that every

node in the cluster can serve any request. Cassandra also supports replication and

multi-data-center replication. Since replication strategies are configurable, you can

set up your distribution architecture to be as centralized or spread out, or as re-

dundant or fail-safe, as you would like. Because data is automatically replicated to

nodes, downed or faulty nodes are easily replaceable. New nodes can be added at

will, without downtime, to increase read and write throughput or even just availab-

ility. The consistency levels are tunable, which allows you to have the application

enforce the amount of resources applied to data assurance at a transaction level.

Cassandra also has an ecosystem being built around it. There are monitoring

systems like OpsCenter to help you see the health of your cluster and manage com-

mon administration tasks. There are drivers for many of the major languages. Cas-

sandra now comes with integration points for Hadoop and MapReduce support,

full text search with Solr, and Apache Pig and Hive support. There is even a SQL-

like query language called CQL, or Cassandra Query Language, to help in the data

modeling and access patterns.

History of Cassandra

Apache Cassandra was originally developed at Facebook in 2008 to power Face-

book's in-box search feature. The original authors were Avinash Lakshman, who

also is one of the authors of the Amazon Dynamo paper, and Prashant Malik. After

being in production at Facebook for a while, Cassandra was released as an open-

source project on Google Code in July of 2008. In March of 2009, it was accepted

to the Apache Foundation as an incubator project. In February of 2010, it became

a top-level Apache project.

As of the time of this writing, the most recent version of Apache Cassandra is

the 1.2 series. Cassandra has come a long way since the first major release after

its graduation to a top-level Apache project. It has picked up support for Hadoop,

text search integration through Solr, CQL, zero-downtime upgrades, virtual nodes

(vnodes), and self-tuning caches, just to name a few of the major features. Cas-

Search WWH ::

Custom Search

Home