Best Practices - Mastering DynamoDB - page 130

Database Reference

In-Depth Information

Evenly distributed data upload

Sometimes, there is a need to upload data from different sources. At that time, uploading

data in a distributed manner is a tedious task. Sometimes, we have both hash and range

keys for certain tables. Consider an example of the tweets table where you would need to

manage usernames and their respective tweets. In that case, you can have the username as

the hash key and the tweet ID as the range key and upload the data in the following man-

ner:

UserName

TweetID

User1

T1

User1

T2

User1

T3

User2

T1

User2

T2

User3

T1

User3

T2

Here, if you request to get all messages from a particular user, then that request might not

distribute the load evenly across the nodes. As you can see from the table, the first few re-

quests would only be writing on the User1 partition, the next few would be writing on the

User2 partition. While the writes are happening on the User1 partition, other partitions

would be at rest, and all loads would only be on the User1 partition, which is definitely not

a good sign from the distributed environment point of view.

A better way to design the same data upload would be to have one tweet per user and then

repeat the same pattern again. This would perform a write request first on the User1 parti-

tion; the second one would be on the User2 partition, and so on. The following table shows

the required sequence of data write requests:

Next Page

Mastering DynamoDB

Search WWH ::

Custom Search

Home