Database Reference
In-Depth Information
Figure 3-5. Data flow for the hybrid style of time series database. Data arrives at the catcher from
the sources and is inserted into the NoSQL database. In the background, the blob maker rewrites
the data later in compressed blob form. Data is retrieved and reformatted by the renderer.
Going One Step Further: The Direct Blob Insertion Design
Compression of old data still leaves one performance bottleneck in place. Since data is inser-
ted in the uncompressed format, the arrival of each data point requires a row update opera-
tion to insert the value into the database. This row update can limit the insertion rate for data
to as little as 20,000 data points per second per node in the cluster.
On the other hand, the direct blob insertion data flow diagrammed in Figure 3-6 allows the
insertion rate to be increased by as much as roughly 1,000-fold. How does the direct blob ap-
proach get this bump in performance? The essential difference is that the blob maker has
been moved into the data flow between the catcher and the NoSQL time series database. This
way, the blob maker can use incoming data from a memory cache rather than extracting its
input from wide table rows already stored in the storage tier.
The basic idea is that data is kept in memory as samples arrive. These samples are also writ-
ten to log files. These log files are the “restart logs” shown in Figure 3-6 and are flat files
that are stored on the Hadoop system but not as part of the storage tier itself. The restart logs
allow the in-memory cache to be repopulated if the data ingestion pipeline has to be restar-
ted.
Search WWH ::




Custom Search