The Nonrelational Landscape - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

▪ Storage : Custom disk-based storage

▪ Production use : Box.net, ThoughtWorks

▪ Additional features : Because it is a graph database, Neo4J can be used to good advantage

with semantic web applications. It allows you to execute SPARQL Protocol and RDF Query

Language (SPARQL) queries for interacting with Resource Description Framework (RDF)

and acts as a partial Web Ontology Language (OWL) store.

Integration with Apache Lucene/Solr is available to store external indexes and perform fast glob-

al searches. An index in distributed databases can be thought of like a dictionary—a direct point-

er from a key to a value.

As of version 1.1, Neo4J features an event framework.

Key-Value Stores and Distributed Hashtables

In a relational model, we tend to first consider the tables that our domain requires, then think of

how we can normalize the tables to avoid duplicate data. The tables with their defined columns

and the relationships between the tables become our schema.

In a key-value store, however, typically you don't define a schema as such. Your domain rather

becomes a bucket into which you can drop data items; the data items are keys that have a set of

attributes. All data relevant to that key is therefore stored with the key, resulting in a sharp con-

trast to the normalized model prized in relational databases: data is frequently duplicated. There

are some variations here, though, and some conceptual overlap results with the columnar data-

bases.

Another contrast is one of modeling. When working with relational databases, we tend to think

hard about the schema, trusting that any question we want to ask the database will be answerable.

Because the questions—the queries—are secondary in this model, they can become very com-

plex. You've surely seen elaborate SQL statements that use several joins, subqueries, aggregate

functions, temporary tables, and so forth. In the columnar model, however, we tend to think of

the query first, and the queries we'll execute help dictate the design of the buckets we'll need.

The assumption in columnar databases that supports this is that we want replication in order for

the database to be available, and that data duplication is OK because disk space is inexpensive.

Data integrity is another point of difference. Data integrity is the extent to which the data in

an application is complete and consistent. Relational databases have some built-in capabilities to

help ensure data integrity, such as primary keys (which ensure entity integrity) and foreign key

constraints (which ensure referential integrity). In a key-value store, however, the responsibility

for data integrity resides entirely with the application.

Search WWH ::

Custom Search

Home