Database Reference
In-Depth Information
Chapter 3. Storing and Processing
Time Series Data
As we mentioned in previous chapters, a time series is a sequence of values, each with a time
value indicating when the value was recorded. Time series data entries are rarely amended,
and time series data is often retrieved by reading a contiguous sequence of samples, possibly
after summarizing or aggregating the retrieved samples as they are retrieved. A time series
database is a way to store multiple time series such that queries to retrieve data from one or a
few time series for a particular time range are particularly efficient. As such, applications for
which time range queries predominate are often good candidates for implementation using a
time series database. As previously explained, the main topic of this topic is the storage and
processing of large-scale time series data, and for this purpose, the preferred technologies are
NoSQL non-relational databases such as Apache HBase or MapR-DB.
Pragmatic advice for practical implementations of large-scale time series databases is the
goal of this topic, so we need to focus in on some basic steps that simplify and strengthen the
process for real-world applications. We will look briefly at approaches that may be useful for
small or medium-sized datasets and then delve more deeply into our main concern: how to
implement large-scale TSDBs.
To get to a solid implementation, there are a number of design decisions to make. The drivers
for these decisions are the parameters that define the data. How many distinct time series are
there? What kind of data is being acquired? At what rate is the data being acquired? For how
long must the data be kept? The answers to these questions help determine the best imple-
mentation strategy.
Search WWH ::




Custom Search