Database Reference
In-Depth Information
Parallel scanning
As we discussed in DynamoDB sharding, the table data is partitioned based on the hash
key value. Even though this sharding will smoothen the read and write operations, it
doesn't help us to scan the partitions in parallel. For example, if the table data is available
in five partitions (each partition has a throughput capacity of five units), then even if the
table could provision more than five capacity units, it cannot do so. The maximum through-
put capacity of the table cannot exceed the fastest (having high throughput) partition. So
based on these facts, what we infer is:
• A scan operation will return maximum 1 MB of data at a time
• Scan operations can read data from only one partition at a time
• For a larger table, no matter how large the throughput is, a sequential scan will al-
ways take too much time
• The scanning speed can never be faster than the fastest partition (having high
throughput)
To put it simply, even if our television has one hundred channels, we will be able to see
only one channel at a time.
The sequential scan works in a round robin fashion, querying no more than one partition at
a time. This raises the question, is there any way by which we can perform the scan across
partitions in parallel? The answer is yes. We do have a solution called parallel scanning.
This parallel scanning works fine in the case of multithreading programming. We need to
understand a word before proceeding with this. It is called a segment. A segment is a logic-
al division of the table that is performed by the scan operation. We call these segment-ex-
ecuting threads as worker threads.
Each worker thread will issue a scan command with two parameters. The first parameter is
Segment , which uniquely identifies the segment (usually starting from 0), and the second
parameter is TotalSegments . All the worker threads will perform scan operations sim-
ultaneously and keep the main thread updated.
During parallel scanning, the data is segmented (based on the TotalSegments paramet-
ers specified while running this parallel scan).
A systematic diagram for parallel scanning is shown right after this paragraph. As per our
diagram, the data is segmented into three segments (S=0, S=1, and S=2). Each segment's
scan execution is taken care of by each thread, controlled by the Application thread .
Search WWH ::




Custom Search