Storing and Processing Time Series Data - Time Series Databases

Database Reference

In-Depth Information

Figure 3-5. Data flow for the hybrid style of time series database. Data arrives at the catcher from

the sources and is inserted into the NoSQL database. In the background, the blob maker rewrites

the data later in compressed blob form. Data is retrieved and reformatted by the renderer.

Going One Step Further: The Direct Blob Insertion Design

Compression of old data still leaves one performance bottleneck in place. Since data is inser-

ted in the uncompressed format, the arrival of each data point requires a row update opera-

tion to insert the value into the database. This row update can limit the insertion rate for data

to as little as 20,000 data points per second per node in the cluster.

On the other hand, the direct blob insertion data flow diagrammed in Figure 3-6 allows the

insertion rate to be increased by as much as roughly 1,000-fold. How does the direct blob ap-

proach get this bump in performance? The essential difference is that the blob maker has

been moved into the data flow between the catcher and the NoSQL time series database. This

way, the blob maker can use incoming data from a memory cache rather than extracting its

input from wide table rows already stored in the storage tier.

The basic idea is that data is kept in memory as samples arrive. These samples are also writ-

ten to log files. These log files are the “restart logs” shown in Figure 3-6 and are flat files

that are stored on the Hadoop system but not as part of the storage tier itself. The restart logs

allow the in-memory cache to be repopulated if the data ingestion pipeline has to be restar-

ted.

Search WWH ::

Custom Search

Home