Storing and Processing Time Series Data - Time Series Databases

Database Reference

In-Depth Information

Why Relational Databases Aren't Quite Right

At this point, it is fair to ask why a relational database couldn't handle nearly the same ingest

and analysis load as is possible by using a hybrid schema with MapR-DB or HBase. This

question is of particular interest when only blob data is inserted and no wide table data is

used, because modern relational databases often have blob or array types.

The answer to this question is that a relational database running this way will provide reason-

able, but not stellar, ingestion and retrieval rates. The real problem with using a relational

database for a system like this is not performance, per se. Instead, the problem is that by

moving to a blob style of data storage, you are giving up almost all of the virtues of a rela-

tional system. Additionally, SQL doesn't provide a good abstraction method to hide the de-

tails of accessing of a blob-based storage format. SQL also won't be able to process the data

in any reasonable way, and special features like multirow transactions won't be used at all.

Transactions, in particular, are a problem here because even though they wouldn't be used,

this feature remains, at a cost. The requirement that a relational database support multirow

transactions makes these databases much more difficult to scale to multinode configurations.

Even getting really high performance out of a single node can require using a high-cost sys-

tem like Oracle. With a NoSQL system like Apache HBase or MapR-DB instead, you can

simply add additional hardware to get more performance.

This pattern of paying a penalty for unused features that get in the way of scaling a system

happens in a number of high-performance systems. It is common that the measures that must

be taken to scale a system inherently negate the virtues of a conventional relational database,

and if you attempt to apply them to a relational database, you still do not get the scaling you

desire. In such cases, moving to an alternative database like HBase or MapR-DB can have

substantial benefits because you gain both performance and scalability.

Hybrid Design: Where Can I Get One?

These hybrid wide/blob table designs can be very alluring. Their promise of enormous per-

formance levels is exciting, and the possibility that they can run on fault-tolerant, Hadoop-

based systems such as the MapR distribution make them attractive from an operational point

of view as well. These new approaches are not speculation; they have been built and they do

provide stunning results. The description we've presented here so far, however, is largely

conceptual. What about real implementations? The next chapter addresses exactly how you

can realize these new designs by describing how you can use OpenTSDB, an open source

time series database tool, along with special open source MapR extensions. The result is a

Search WWH ::

Custom Search

Home