Storing and Processing Time Series Data - Time Series Databases

Database Reference

In-Depth Information

to keep the number of series per file small multiplies the number of files. Likewise, shorten-

ing the partition time will multiply the number of files as well. When storing data on a sys-

tem such as Apache Hadoop using HDFS, having a large number of files can cause serious

stability problems. Advanced Hadoop-based systems like MapR can easily handle the num-

ber of files involved, but retrieving and managing large numbers of very small files can be

inefficient due to the increased seek time required.

To avoid these problems, a natural step is to move to some form of a real database to store

the data. The best way to do this is not entirely obvious, however, as you have several

choices about the type of database and its design. We will examine the issues to help you de-

cide.

Moving Up to a Real Database: But Will RDBMS Suffice?

Even well-partitioned flat files will fail you in handling your large-scale time series data, so

you will want to consider some type of true database. When first storing time series data in a

database, it is tempting to use a so-called star schema design and to store the data in a rela-

tional database (RDBMS). In such a database design, the core data is stored in a fact table

that looks something like what is shown in Figure 3-2 .

Search WWH ::

Custom Search

Home