Database Reference
In-Depth Information
to keep the number of series per file small multiplies the number of files. Likewise, shorten-
ing the partition time will multiply the number of files as well. When storing data on a sys-
tem such as Apache Hadoop using HDFS, having a large number of files can cause serious
stability problems. Advanced Hadoop-based systems like MapR can easily handle the num-
ber of files involved, but retrieving and managing large numbers of very small files can be
inefficient due to the increased seek time required.
To avoid these problems, a natural step is to move to some form of a real database to store
the data. The best way to do this is not entirely obvious, however, as you have several
choices about the type of database and its design. We will examine the issues to help you de-
cide.
Moving Up to a Real Database: But Will RDBMS Suffice?
Even well-partitioned flat files will fail you in handling your large-scale time series data, so
you will want to consider some type of true database. When first storing time series data in a
database, it is tempting to use a so-called star schema design and to store the data in a rela-
tional database (RDBMS). In such a database design, the core data is stored in a fact table
that looks something like what is shown in Figure 3-2 .
Search WWH ::




Custom Search