Database Reference
In-Depth Information
These types of sensors take an enormous number of measurements, which raises the issue of
how to make use of the enormous influx of data they produce. New methods are needed to
deal with the entire time series pipeline from sensor to insight. Sensor data must be collected
at the site of measurement and communicated. Transport technologies are needed to carry
this information to the platform used for central storage and analysis. That's where the meth-
ods for scalable time series databases come in. These new TSDB technologies lie at the heart
of the IoT and more.
This evolution is natural—doing new things calls for new tools, and time series databases for
very large-scale datasets are important tools. Services are emerging to provide technology
that is custom designed to handle large-scale time series data typical of sensor data. In this
book, however, we have focused on how to build your own time series database, one that is
cost effective and provides excellent performance at high data rates and very large volume.
We recommend using Apache Hadoop-based NoSQL platforms—such as Apache HBase or
MapR-DB—for building large-scale, non-relational time series databases because of their
scalability and the efficiency of data retrieval they provide for time series data. When is that
the right solution? In simple terms, a time series database is the right choice when you have a
very large amount of data that requires a scalable technology and when the queries you want
to make are mainly based on a time span.
New Options for Very High-Performance TSDBs
We've described some open source tools and new approaches to build large-scale time series
databases. These include open source tools such as Open TSDB, code extensions to modify
Open TSDB that were developed by MapR, and a convenient user interface called Grafana
that works with Open TSDB.
The design of the data workflow, data format, and table organization all affect performance
of a time series database. Data can be loaded into wide tables in a point-by-point manner in a
NoSQL-style, non-relational storage tier for better performance and scalability as compared
to a traditional relational database schema with one row per data point. For even faster re-
trieval, a hybrid table design can be achieved with a data flow that retrieves data from wide
table for compression into blobs and reloads the table with row compaction. Unmodified
Open TSDB produces this hybrid-style storage tier. To greatly improve the rate of ingestion,
you can make use of the new open source extensions developed by MapR to enable direct
blob insertion. This style also solves the problem of how to quickly ingest sufficient data to
Search WWH ::




Custom Search