Database Reference
In-Depth Information
Parallel scan
DynamoDB's scan operation processes the data sequentially from the table, which means
that, for a complete table scan, DynamoDB first retrieves 1 MB of data, returns it, and then
goes and scans the next 1 MB of data, which is quite a nasty and time-consuming way of
dealing with huge table scans.
Though Dynamo stores data on multiple logical partitions, a scan operation can only work
on one partition at a time. This type of restriction leads to underutilization of the provi-
sioned throughput.
To address all these issues, DynamoDB introduced the parallel scan, which divides the
table into multiple segments, and multiple threads work on a single segment at a time. Here
multiple threads and processes are invoked together, and each retrieve 1 MB of data every
time. The process that works upon each segment is called a worker. To issue a parallel scan,
you need to provide the TotalSegments value; TotalSegments is simply the num-
ber of workers going to access a certain table in parallel.
Suppose you have three workers, then you need to invoke the scan command in the fol-
lowing manner:
Scan (TotalSegments = 3, Segment = 0,..)
Scan (TotalSegments = 3, Segment = 1,..)
Scan (TotalSegments = 3, Segment = 2,..)
Here, we would logically divide the table into three segments, and each thread would scan
a dedicated segment only. The following is the pictorial representation of a parallel scan:
Search WWH ::




Custom Search