Database Reference
In-Depth Information
Storage : Custom disk-based storage
Production use : Box.net, ThoughtWorks
Additional features : Because it is a graph database, Neo4J can be used to good advantage
with semantic web applications. It allows you to execute SPARQL Protocol and RDF Query
Language (SPARQL) queries for interacting with Resource Description Framework (RDF)
and acts as a partial Web Ontology Language (OWL) store.
Integration with Apache Lucene/Solr is available to store external indexes and perform fast glob-
al searches. An index in distributed databases can be thought of like a dictionary—a direct point-
er from a key to a value.
As of version 1.1, Neo4J features an event framework.
Key-Value Stores and Distributed Hashtables
In a relational model, we tend to first consider the tables that our domain requires, then think of
how we can normalize the tables to avoid duplicate data. The tables with their defined columns
and the relationships between the tables become our schema.
In a key-value store, however, typically you don't define a schema as such. Your domain rather
becomes a bucket into which you can drop data items; the data items are keys that have a set of
attributes. All data relevant to that key is therefore stored with the key, resulting in a sharp con-
trast to the normalized model prized in relational databases: data is frequently duplicated. There
are some variations here, though, and some conceptual overlap results with the columnar data-
bases.
Another contrast is one of modeling. When working with relational databases, we tend to think
hard about the schema, trusting that any question we want to ask the database will be answerable.
Because the questions—the queries—are secondary in this model, they can become very com-
plex. You've surely seen elaborate SQL statements that use several joins, subqueries, aggregate
functions, temporary tables, and so forth. In the columnar model, however, we tend to think of
the query first, and the queries we'll execute help dictate the design of the buckets we'll need.
The assumption in columnar databases that supports this is that we want replication in order for
the database to be available, and that data duplication is OK because disk space is inexpensive.
Data integrity is another point of difference. Data integrity is the extent to which the data in
an application is complete and consistent. Relational databases have some built-in capabilities to
help ensure data integrity, such as primary keys (which ensure entity integrity) and foreign key
constraints (which ensure referential integrity). In a key-value store, however, the responsibility
for data integrity resides entirely with the application.
Search WWH ::




Custom Search