NOSQL Overview - Graph Databases

Databases Reference

In-Depth Information

Query versus Processing in Aggregate Stores

In the preceding sections we've highlighted the similarities and differences between the

document, key-value, and column family data models. On balance, the similarities have

been greater than the differences. In fact, the similarities are so great, the three types are

sometimes referred to jointly as aggregate stores . Aggregate stores persist standalone

complex records that reflect the Domain-Driven Design notion of an aggregate .

Each aggregate store has a different storage strategy, yet they all have a great deal in

common when it comes to querying data. For simple ad hoc queries, each tends to

provide features such as indexing, simple document linking, or a query language. For

more complex queries, applications commonly identify and extract a subset of data from

the store before piping it through some external processing infrastructure such as a

MapReduce framework. This is done when the necessary deep insight cannot be gen‐

erated simply by examining individual aggregates.

MapReduce , like BigTable, is another technique that comes to us from Google. The most

prevalent open source implementations of MapReduce is Apache Hadoop and its at‐

tendant ecosystem.

MapReduce is a parallel programming model that splits data and operates on it in par‐

allel before gathering it back together and aggregating it to provide focused information.

If, for example, we wanted to use it to count how many American artists there are in a

recording artists database, we'd extract all the artist records and discard the non-

American ones in the map phase, and then count the remaining records in the reduce

phase.

Even with a lot of machines and a fast network infrastructure, MapReduce can be quite

latent. Normally, we'd use the features of the data store to provide a more focused dataset

—perhaps using indexes or other ad hoc queries—and then MapReduce that smaller

dataset to arrive at our answer.

Aggregate stores are not built to deal with highly connected data. We can use them for

that purpose, but we have to add code to fill in where the underlying data model leaves

off, resulting in a development experience that is far from seamless, and operational

characteristics that are generally speaking not very fast, particularly as the number of

hops (or “degree” of the query) increases. Aggregate stores may be good at strong data

that's big, but they aren't generally that great at dealing with problems that require an

understanding of how things are connected.

Graph Databases

A graph database is an online (“real-time”) database management system with Create,

Read, Update, and Delete (CRUD) methods that expose a graph data model. Graph

databases are generally built for use with transactional (OLTP) systems. Accordingly,

Search WWH ::

Custom Search

Home