Database Reference
In-Depth Information
Using a library like this is an excellent way to get the benefits of the simple conceptual inter-
face that Open TSDB provides combined with whatever your favorite language might be.
Using a package like this to access data stored in Open TSDB works relatively well for mod-
erate volumes of data (up to a few hundred thousand data points, say), but it becomes in-
creasingly sluggish as data volumes increase. Downsampling is a good approach to manage
this, but downsampling discards information that you may need in your analysis. At some
point, you may find that the amount of data that you are trying to retrieve from your database
is simply too large either because downloading the data takes too long or because analysis in
tools like R or Go becomes too slow.
If and when this happens, you will need to move to a more scalable analysis tool that can
process the data in parallel.
Accessing Open TSDB Data Using SQL-on-Hadoop Tools
If you need to analyze large volumes of time series data beyond what works with the REST
interface and downsampling, you probably also need to move to parallel execution of your
analysis. At this point, it is usually best to access the contents of the Open TSDB data direc-
tly via the HBase API rather than depending on the REST interface that the TSD process
provides.
You might expect to use SQL or the new SQL-on-Hadoop tools for this type of parallel ac-
cess and analysis. Unfortunately, the wide table and blob formats that Open TSDB uses in or-
der to get high performance can make it more difficult to access this data using SQL-based
tools than you might expect. SQL as a language is not a great choice for actually analyzing
time series data. When it comes to simply accessing data from Open TSDB, the usefulness of
SQL depends strongly on which tool you select, as elaborated in the following sections. For
some tools, the non-relational data formats used in Open TSDB can be difficult to access
without substantial code development. In any case, special techniques that vary by tool are
required to analyze time series data from Open TSDB. New SQL-on-Hadoop tools are being
developed. In the next sections, we compare some of the currently available tools with regard
to how well they let you access your time series database and Open TSDB.
Using Apache Spark SQL
Apache Spark SQL has some advantages in working with time series databases. Spark SQL
is very different from Apache Hive in that it is embedded in and directly accessible from a
Search WWH ::




Custom Search