Database Reference
In-Depth Information
Evenly distributed data upload
Sometimes, there is a need to upload data from different sources. At that time, uploading
data in a distributed manner is a tedious task. Sometimes, we have both hash and range
keys for certain tables. Consider an example of the tweets table where you would need to
manage usernames and their respective tweets. In that case, you can have the username as
the hash key and the tweet ID as the range key and upload the data in the following man-
ner:
UserName
TweetID
User1
T1
User1
T2
User1
T3
User2
T1
User2
T2
User3
T1
User3
T2
Here, if you request to get all messages from a particular user, then that request might not
distribute the load evenly across the nodes. As you can see from the table, the first few re-
quests would only be writing on the User1 partition, the next few would be writing on the
User2 partition. While the writes are happening on the User1 partition, other partitions
would be at rest, and all loads would only be on the User1 partition, which is definitely not
a good sign from the distributed environment point of view.
A better way to design the same data upload would be to have one tweet per user and then
repeat the same pattern again. This would perform a write request first on the User1 parti-
tion; the second one would be on the User2 partition, and so on. The following table shows
the required sequence of data write requests:
Search WWH ::




Custom Search