Practical Time Series Tools - Time Series Databases

Database Reference

In-Depth Information

state in memory, you can run multiple TSD processes without worrying about them stepping

on each other. The TSD architecture shown here corresponds to the data flow depicted in the

previous chapter in Figure 3-5 to produce hybrid-style tables. Note that the data catcher and

the background blob maker of that figure are contained within the TSD component shown

here in Figure 4-1 .

User interface components such as the original Open TSDB user interface communicate dir-

ectly with the TSD to retrieve data. The TSD retrieves the requested data from the storage

tier, summarizes and aggregates it as requested, and returns the result. In the native Open

TSDB user interface, the data is returned directly to the user's browser in the form of a PNG

plot generated by the Gnuplot program. External interfaces and analysis scripts can use the

PNG interface, but they more commonly use the REST interface of Open TSDB to read ag-

gregated data in JSON form and generate their own visualizations.

Open TSDB suffers a bit in terms of ingestion performance by having collectors to send just

a few data points at a time (typically just one point at a time) and by inserting data in the

wide table format before later reformatting the data into blob format (this is the standard hy-

brid table data flow). Typically, it is unusual to be able to insert data into the wide table

format at higher than about 10,000 data points per second per storage tier node. Getting in-

gestion rates up to or above a million data points per second therefore requires a large num-

ber of nodes in the storage tier. Wanting faster ingestion is not just a matter of better per-

formance always being attractive; many modern situations produce data at such volume and

velocity that in order be able to store and analyze it as a time series, it's necessary to increase

the data load rates for the time series database in order to the do the projects at all.

This limitation on bulk ingestion speed can be massively improved by using an alternative

ingestion program to directly write data into the storage tier in blob format. We will describe

how this works in the next section.

Value Added: Direct Blob Loading for High Performance

An alternative to inserting each data point one by one is to buffer data in memory and insert

a blob containing the entire batch. The trick is to move the blob maker upstream of insertion

into the storage tier as described in Chapter 3 and Figure 3-6 . The first time the data hits the

table, it is already compressed as a blob. Inserting entire blobs of data this way will help if

the time windows can be sized so that a large number of data points are included in each

blob. Grouping data like this improves ingestion performance because the number of rows

that need to be written to the storage tier is decreased by a factor equal to the average number

Search WWH ::

Custom Search

Home