Database Reference
In-Depth Information
(0, 1, or 2). Each worker will return the query output (each 1 MB) back to the main thread
as soon as it gets the data.
It is not mandatory that the parameters (such as Segment and TotalSegments ) must
be the same every time we perform a parallel scan on the table. We can play with these
two parameters and find out which is most suited for our requirement.
Tip
We need to keep another thing in our mind. Since three threads are scanning our table, all
our throughput capacity might be used. So the TotalSegments parameter should not
be made bigger, as it might eat up all our capacity units in one go. Therefore, for mission
critical applications this approach is suitable. But, if the same approach is used by a cold
application (low priority application), then critical applications sharing the same table will
have to wait until the parallel scan (run by the cold application) is completed. In order to
overcome this problem, we can use a limit on each worker thread scan.
Nothing in this world comes without a tradeoff. Parallel scanning does have a lot of ad-
vantages. But in the hands of an ignorant programmer, it behaves differently (mostly neg-
atively). There are a few guidelines for better usage of (when to use) parallel scanning,
which are listed as follows:
• The table size is larger (than 20 GB)
• The scan operations are not able to utilize the table's full read throughput
• A normal scan operation is very slow
These bullets will tell us whether we need parallel scanning. Once it is confirmed that we
must use parallel scanning, then the second question is how could we optimize parallel
scanning? The answer is as follows: if we optimize a single parameter, then our parallel
scanning will work fine. The parameter that decides the number of threads is
TotalSegments .
We should choose the optimal value for this parameter. This value can be decided only by
experience, by trying several values and finding out which value suits us better. There are
a few guidelines put forward by AWS, which are available in the DynamoDB documenta-
tion too. We will discuss the same guidelines here.
First and foremost, we will use parallel scanning only if our table size is above 20 GB
(first guideline). So a single worker will perform the scan operation for every 2 GB of
data. This means that the number of workers (which is decided by and is same as that of
Search WWH ::




Custom Search