Databases Reference
In-Depth Information
Query versus Processing in Aggregate Stores
In the preceding sections we've highlighted the similarities and differences between the
document, key-value, and column family data models. On balance, the similarities have
been greater than the differences. In fact, the similarities are so great, the three types are
sometimes referred to jointly as aggregate stores . Aggregate stores persist standalone
complex records that reflect the Domain-Driven Design notion of an aggregate .
Each aggregate store has a different storage strategy, yet they all have a great deal in
common when it comes to querying data. For simple ad hoc queries, each tends to
provide features such as indexing, simple document linking, or a query language. For
more complex queries, applications commonly identify and extract a subset of data from
the store before piping it through some external processing infrastructure such as a
MapReduce framework. This is done when the necessary deep insight cannot be gen‐
erated simply by examining individual aggregates.
MapReduce , like BigTable, is another technique that comes to us from Google. The most
prevalent open source implementations of MapReduce is Apache Hadoop and its at‐
tendant ecosystem.
MapReduce is a parallel programming model that splits data and operates on it in par‐
allel before gathering it back together and aggregating it to provide focused information.
If, for example, we wanted to use it to count how many American artists there are in a
recording artists database, we'd extract all the artist records and discard the non-
American ones in the map phase, and then count the remaining records in the reduce
phase.
Even with a lot of machines and a fast network infrastructure, MapReduce can be quite
latent. Normally, we'd use the features of the data store to provide a more focused dataset
—perhaps using indexes or other ad hoc queries—and then MapReduce that smaller
dataset to arrive at our answer.
Aggregate stores are not built to deal with highly connected data. We can use them for
that purpose, but we have to add code to fill in where the underlying data model leaves
off, resulting in a development experience that is far from seamless, and operational
characteristics that are generally speaking not very fast, particularly as the number of
hops (or “degree” of the query) increases. Aggregate stores may be good at strong data
that's big, but they aren't generally that great at dealing with problems that require an
understanding of how things are connected.
Graph Databases
A graph database is an online (“real-time”) database management system with Create,
Read, Update, and Delete (CRUD) methods that expose a graph data model. Graph
databases are generally built for use with transactional (OLTP) systems. Accordingly,
Search WWH ::




Custom Search