Database Reference
In-Depth Information
Why Relational Databases Aren't Quite Right
At this point, it is fair to ask why a relational database couldn't handle nearly the same ingest
and analysis load as is possible by using a hybrid schema with MapR-DB or HBase. This
question is of particular interest when only blob data is inserted and no wide table data is
used, because modern relational databases often have blob or array types.
The answer to this question is that a relational database running this way will provide reason-
able, but not stellar, ingestion and retrieval rates. The real problem with using a relational
database for a system like this is not performance, per se. Instead, the problem is that by
moving to a blob style of data storage, you are giving up almost all of the virtues of a rela-
tional system. Additionally, SQL doesn't provide a good abstraction method to hide the de-
tails of accessing of a blob-based storage format. SQL also won't be able to process the data
in any reasonable way, and special features like multirow transactions won't be used at all.
Transactions, in particular, are a problem here because even though they wouldn't be used,
this feature remains, at a cost. The requirement that a relational database support multirow
transactions makes these databases much more difficult to scale to multinode configurations.
Even getting really high performance out of a single node can require using a high-cost sys-
tem like Oracle. With a NoSQL system like Apache HBase or MapR-DB instead, you can
simply add additional hardware to get more performance.
This pattern of paying a penalty for unused features that get in the way of scaling a system
happens in a number of high-performance systems. It is common that the measures that must
be taken to scale a system inherently negate the virtues of a conventional relational database,
and if you attempt to apply them to a relational database, you still do not get the scaling you
desire. In such cases, moving to an alternative database like HBase or MapR-DB can have
substantial benefits because you gain both performance and scalability.
Hybrid Design: Where Can I Get One?
These hybrid wide/blob table designs can be very alluring. Their promise of enormous per-
formance levels is exciting, and the possibility that they can run on fault-tolerant, Hadoop-
based systems such as the MapR distribution make them attractive from an operational point
of view as well. These new approaches are not speculation; they have been built and they do
provide stunning results. The description we've presented here so far, however, is largely
conceptual. What about real implementations? The next chapter addresses exactly how you
can realize these new designs by describing how you can use OpenTSDB, an open source
time series database tool, along with special open source MapR extensions. The result is a
Search WWH ::




Custom Search