Database Reference
In-Depth Information
Managing time series data
Many times we have a requirement to store time series data in our database. We might be
saving data in that table over years and the table size would keep growing. Consider the ex-
ample of an order table where you would be saving orders made my customers. You can
choose the order ID as the hash key and the date/time as the range. This strategy would cer-
tainly segregate the data, and you would be able to query data on order ID with date/time
easily, but there is a problem with this approach as here there is a good chance recent data
will be accessed more frequently than older data.
So, here we might end up creating some partitions as hot partitions, while others would be
cold partitions. To solve this problem, it is recommended to create tables based on time
range, which means creating a new table for each week or month instead of saving all data
in the table. This strategy helps avoid the creation of any hot or cold partitions. You can
simply query data for a particular time range table itself. This strategy also helps when you
need to purge data where you can simply drop the tables you don't wish to see any more.
Alternatively, you can simply dump that data on AWS S3, as flat files, which is a cheap
data storage service from Amazon.
We are going to see how to integrate AWS S3 with DynamoDB in Chapter 6 , Integrating
DynamoDB with Other AWS Components .
Search WWH ::




Custom Search