Database Reference
In-Depth Information
ically. Once the hashing is done, similar hash rows will be saved automatically in different
servers (if necessary) to satisfy the specified provisioned throughput capacity.
While discussing the hash type key (primary key and index) in earlier chapters, have you
ever wondered about the importance of the hash type key while creating a table (which is
mandatory)? Of course we all know the importance of the range key and what it does. It
simply sorts items based on the range key value. So far, we might have been thinking that
the range key is more important than the hash key. If you think that way, then you may be
correct, provided we neither need our table to be provisioned faster nor do we need to cre-
ate any partitions for our table. As long as the table data is smaller, the importance of the
hash key will be realized only while writing a query operation. However, once the table
grows, in order to satisfy the same provision throughput, DynamoDB needs to partition
our table data based on this hash key (as shown in the previous diagram).
This partitioning of table items based on the hash key attribute is called sharding. It means
the partitions are created by splitting items and not attributes. This is the reason why a
query that has the hash key (of table and index) retrieves items much faster.
Since the number of partitions is managed automatically by DynamoDB, we cannot just
hope for things to work fine. We also need to keep certain things in mind, for example, the
hash key attribute should have more distinct values. To simplify, it is not advisable to put
binary values (such as Yes or No , Present or Past or Future , and so on) into the
hash key attributes, thereby restricting the number of partitions. If our hash key attribute
has either Yes or No values in all the items, then DynamoDB can create only a maximum
of two partitions; therefore, the specified provisioned throughput cannot be achieved.
Just consider that we have created a table called Tbl_Sports with a provisioned
throughput capacity of 10, and then we put 10 items into the table. Assuming that only a
single partition is created, we are able to retrieve 10 items per second. After a point of
time, we put 10 more items into the table. DynamoDB will create another partition (by
hashing over the hash key), thereby satisfying the provisioned throughput capacity. There
is a formula taken from the AWS site:
Total provisioned throughput/partitions = throughput per partition
OR
No. of partitions = Total provisioned throughput/throughput per partition
Search WWH ::




Custom Search