Practical Time Series Tools - Time Series Databases

Database Reference

In-Depth Information

Using a library like this is an excellent way to get the benefits of the simple conceptual inter-

face that Open TSDB provides combined with whatever your favorite language might be.

Using a package like this to access data stored in Open TSDB works relatively well for mod-

erate volumes of data (up to a few hundred thousand data points, say), but it becomes in-

creasingly sluggish as data volumes increase. Downsampling is a good approach to manage

this, but downsampling discards information that you may need in your analysis. At some

point, you may find that the amount of data that you are trying to retrieve from your database

is simply too large either because downloading the data takes too long or because analysis in

tools like R or Go becomes too slow.

If and when this happens, you will need to move to a more scalable analysis tool that can

process the data in parallel.

Accessing Open TSDB Data Using SQL-on-Hadoop Tools

If you need to analyze large volumes of time series data beyond what works with the REST

interface and downsampling, you probably also need to move to parallel execution of your

analysis. At this point, it is usually best to access the contents of the Open TSDB data direc-

tly via the HBase API rather than depending on the REST interface that the TSD process

provides.

You might expect to use SQL or the new SQL-on-Hadoop tools for this type of parallel ac-

cess and analysis. Unfortunately, the wide table and blob formats that Open TSDB uses in or-

der to get high performance can make it more difficult to access this data using SQL-based

tools than you might expect. SQL as a language is not a great choice for actually analyzing

time series data. When it comes to simply accessing data from Open TSDB, the usefulness of

SQL depends strongly on which tool you select, as elaborated in the following sections. For

some tools, the non-relational data formats used in Open TSDB can be difficult to access

without substantial code development. In any case, special techniques that vary by tool are

required to analyze time series data from Open TSDB. New SQL-on-Hadoop tools are being

developed. In the next sections, we compare some of the currently available tools with regard

to how well they let you access your time series database and Open TSDB.

Using Apache Spark SQL

Apache Spark SQL has some advantages in working with time series databases. Spark SQL

is very different from Apache Hive in that it is embedded in and directly accessible from a

Search WWH ::

Custom Search

Home