Database Reference
In-Depth Information
state in memory, you can run multiple TSD processes without worrying about them stepping
on each other. The TSD architecture shown here corresponds to the data flow depicted in the
previous chapter in Figure 3-5 to produce hybrid-style tables. Note that the data catcher and
the background blob maker of that figure are contained within the TSD component shown
here in Figure 4-1 .
User interface components such as the original Open TSDB user interface communicate dir-
ectly with the TSD to retrieve data. The TSD retrieves the requested data from the storage
tier, summarizes and aggregates it as requested, and returns the result. In the native Open
TSDB user interface, the data is returned directly to the user's browser in the form of a PNG
plot generated by the Gnuplot program. External interfaces and analysis scripts can use the
PNG interface, but they more commonly use the REST interface of Open TSDB to read ag-
gregated data in JSON form and generate their own visualizations.
Open TSDB suffers a bit in terms of ingestion performance by having collectors to send just
a few data points at a time (typically just one point at a time) and by inserting data in the
wide table format before later reformatting the data into blob format (this is the standard hy-
brid table data flow). Typically, it is unusual to be able to insert data into the wide table
format at higher than about 10,000 data points per second per storage tier node. Getting in-
gestion rates up to or above a million data points per second therefore requires a large num-
ber of nodes in the storage tier. Wanting faster ingestion is not just a matter of better per-
formance always being attractive; many modern situations produce data at such volume and
velocity that in order be able to store and analyze it as a time series, it's necessary to increase
the data load rates for the time series database in order to the do the projects at all.
This limitation on bulk ingestion speed can be massively improved by using an alternative
ingestion program to directly write data into the storage tier in blob format. We will describe
how this works in the next section.
Value Added: Direct Blob Loading for High Performance
An alternative to inserting each data point one by one is to buffer data in memory and insert
a blob containing the entire batch. The trick is to move the blob maker upstream of insertion
into the storage tier as described in Chapter 3 and Figure 3-6 . The first time the data hits the
table, it is already compressed as a blob. Inserting entire blobs of data this way will help if
the time windows can be sized so that a large number of data points are included in each
blob. Grouping data like this improves ingestion performance because the number of rows
that need to be written to the storage tier is decreased by a factor equal to the average number
Search WWH ::




Custom Search