Designing Real-Time Streaming Architectures - Real-Time Analytics

Database Reference

In-Depth Information

The most common styles of NoSQL databases are the various forms of

persistent key-value stores. They range from single-machine master-slave

data stores, such as Redis, to fully distributed, eventually consistent, stores,

such as Cassandra. Their fundamental data structure is the key with an

arbitrary byte array value, but most have built some level of abstraction

on the core entity. Some, such as Cassandra, even extend the abstraction

to offering a SQL-like language that includes schemas and familiar-looking

statements, although they don't support many features of the language.

The NoSQL database world also includes a variety of hybrid data stores,

such as MongoDB. Rather than being a key-value store, MongoDB is a form

of indexed document store. In many ways it is closer to a search engine like

Google than it is to a relational database. Like the key-value stores, it has a

very limited query language. It does not have a schema; instead it uses an

optimized JSON representation for documents that allow for rich structure.

Unlike most of the key-value stores, it also offers abstractions for reference

between documents.

Along with the simplification of the query languages and maintenance of

schemas comes a relaxation of the aforementioned ACID requirements.

Atomicity and consistency, in particular, are often sacrificed in these data

stores. By relaxing the consistency constraint to one of “eventual

consistency,” these stores gain some performance through reduced

bookkeeping and a much simpler model for distributing themselves across a

large number of machines that may be quite distant physically. In practice,

for streaming applications, it is generally not necessary that each client have

the same view of the data at the same time. The principal problem is when

two physically separate copies of the database attempt to modify the same

piece of state. Resolving this problem is tricky, but it is possible and is

discussed in detail in Chapter 3, “Service Configuration and Coordination.”

Relaxing the atomicity requirement also usually results in a performance

gain for the data store, and maximum performance is the ultimate goal of all

of these data stores. Most of them maintain atomicity in some lightweight

way, usually the special case of counter types. Maintaining atomic counters,

as it happens, is not too difficult and also happens to be a common use case,

leading most data stores to implement some form of counter.

Each of the stores discussed in this topic have different strengths and

weaknesses. There is even a place in real-time applications for traditional

relational databases. Different applications will perform better with one

Search WWH ::

Custom Search

Home