What’s Next? - Time Series Databases

Database Reference

In-Depth Information

These types of sensors take an enormous number of measurements, which raises the issue of

how to make use of the enormous influx of data they produce. New methods are needed to

deal with the entire time series pipeline from sensor to insight. Sensor data must be collected

at the site of measurement and communicated. Transport technologies are needed to carry

this information to the platform used for central storage and analysis. That's where the meth-

ods for scalable time series databases come in. These new TSDB technologies lie at the heart

of the IoT and more.

This evolution is natural—doing new things calls for new tools, and time series databases for

very large-scale datasets are important tools. Services are emerging to provide technology

that is custom designed to handle large-scale time series data typical of sensor data. In this

book, however, we have focused on how to build your own time series database, one that is

cost effective and provides excellent performance at high data rates and very large volume.

We recommend using Apache Hadoop-based NoSQL platforms—such as Apache HBase or

MapR-DB—for building large-scale, non-relational time series databases because of their

scalability and the efficiency of data retrieval they provide for time series data. When is that

the right solution? In simple terms, a time series database is the right choice when you have a

very large amount of data that requires a scalable technology and when the queries you want

to make are mainly based on a time span.

New Options for Very High-Performance TSDBs

We've described some open source tools and new approaches to build large-scale time series

databases. These include open source tools such as Open TSDB, code extensions to modify

Open TSDB that were developed by MapR, and a convenient user interface called Grafana

that works with Open TSDB.

The design of the data workflow, data format, and table organization all affect performance

of a time series database. Data can be loaded into wide tables in a point-by-point manner in a

NoSQL-style, non-relational storage tier for better performance and scalability as compared

to a traditional relational database schema with one row per data point. For even faster re-

trieval, a hybrid table design can be achieved with a data flow that retrieves data from wide

table for compression into blobs and reloads the table with row compaction. Unmodified

Open TSDB produces this hybrid-style storage tier. To greatly improve the rate of ingestion,

you can make use of the new open source extensions developed by MapR to enable direct

blob insertion. This style also solves the problem of how to quickly ingest sufficient data to

Search WWH ::

Custom Search

Home