A New World for Time Series Databases - Time Series Databases

Database Reference

In-Depth Information

NOSQL VERSUS RDBMS: WHAT'S THE DIFFERENCE, WHAT'S THE

POINT?

NoSQL databases and relational databases share the same basic goals: to store and retrieve data

and to coordinate changes. The difference is that NoSQL databases trade away some of the capab-

ilities of relational databases in order to improve scalability. In particular, NoSQL databases typ-

ically have much simpler coordination capabilities than the transactions that traditional relational

systems provide (or even none at all). The NoSQL databases usually eliminate all or most of SQL

query language and, importantly, the complex optimizer required for SQL to be useful.

The benefits of making this trade include greater simplicity in the NoSQL database, the ability to

handle semi-structured and denormalized data and, potentially, much higher scalability for the

system. The drawbacks include a compensating increase in the complexity of the application and

loss of the abstraction provided by the query optimizer. Losing the optimizer means that much of

the optimization of queries has to be done inside the developer's head and is frozen into the ap-

plication code. Of course, losing the optimizer also can be an advantage since it allows the deve-

loper to have much more predictable performance.

Over time, the originally hard-and-fast tradeoffs involving the loss of transactions and SQL in re-

turn for the performance and scalability of the NoSQL database have become much more nu-

anced. New forms of transactions are becoming available in some NoSQL databases that provide

much weaker guarantees than the kinds of transactions in RDBMS. In addition, modern imple-

mentations of SQL such as open source Apache Drill allow analysts and developers working with

NoSQL applications to have a full SQL language capability when they choose, while retaining

scalability.

Until recently, the standard approach to dealing with large-scale time series data has been to

decide from the start which data to sample, to study a few weeks' or months' worth of the

sampled data, produce the desired reports, summarize some results to be archived, and then

discard most or all of the original data. Now that's changing. There is a golden opportunity to

do broader and deeper analytics, exploring data that would previously have been discarded.

At modern rates of data production, even a few weeks or months is a large enough data

volume that it starts to overwhelm traditional database methods. With the new scalable

NoSQL platforms and tools for data storage and access, it's now feasible to archive years of

raw or lightly processed data. These much finer-grained and longer histories are especially

valuable in modeling needed for predictive analytics, for anomaly detection, for back-testing

new models, and in finding long-term trends and correlations.

As a result of these new options, the number of situations in which data is being collected as

time series is also expanding, as is the need for extremely reliable and high-performance time

Search WWH ::

Custom Search

Home