Storing and Processing Time Series Data - Time Series Databases

Database Reference

In-Depth Information

Chapter 3. Storing and Processing

Time Series Data

As we mentioned in previous chapters, a time series is a sequence of values, each with a time

value indicating when the value was recorded. Time series data entries are rarely amended,

and time series data is often retrieved by reading a contiguous sequence of samples, possibly

after summarizing or aggregating the retrieved samples as they are retrieved. A time series

database is a way to store multiple time series such that queries to retrieve data from one or a

few time series for a particular time range are particularly efficient. As such, applications for

which time range queries predominate are often good candidates for implementation using a

time series database. As previously explained, the main topic of this topic is the storage and

processing of large-scale time series data, and for this purpose, the preferred technologies are

NoSQL non-relational databases such as Apache HBase or MapR-DB.

Pragmatic advice for practical implementations of large-scale time series databases is the

goal of this topic, so we need to focus in on some basic steps that simplify and strengthen the

process for real-world applications. We will look briefly at approaches that may be useful for

small or medium-sized datasets and then delve more deeply into our main concern: how to

implement large-scale TSDBs.

To get to a solid implementation, there are a number of design decisions to make. The drivers

for these decisions are the parameters that define the data. How many distinct time series are

there? What kind of data is being acquired? At what rate is the data being acquired? For how

long must the data be kept? The answers to these questions help determine the best imple-

mentation strategy.

Search WWH ::

Custom Search

Home