Query and Scan Operations in DynamoDB - DynamoDB Applied Design Patterns

Database Reference

In-Depth Information

(0, 1, or 2). Each worker will return the query output (each 1 MB) back to the main thread

as soon as it gets the data.

It is not mandatory that the parameters (such as Segment and TotalSegments ) must

be the same every time we perform a parallel scan on the table. We can play with these

two parameters and find out which is most suited for our requirement.

Tip

We need to keep another thing in our mind. Since three threads are scanning our table, all

our throughput capacity might be used. So the TotalSegments parameter should not

be made bigger, as it might eat up all our capacity units in one go. Therefore, for mission

critical applications this approach is suitable. But, if the same approach is used by a cold

application (low priority application), then critical applications sharing the same table will

have to wait until the parallel scan (run by the cold application) is completed. In order to

overcome this problem, we can use a limit on each worker thread scan.

Nothing in this world comes without a tradeoff. Parallel scanning does have a lot of ad-

vantages. But in the hands of an ignorant programmer, it behaves differently (mostly neg-

atively). There are a few guidelines for better usage of (when to use) parallel scanning,

which are listed as follows:

• The table size is larger (than 20 GB)

• The scan operations are not able to utilize the table's full read throughput

• A normal scan operation is very slow

These bullets will tell us whether we need parallel scanning. Once it is confirmed that we

must use parallel scanning, then the second question is how could we optimize parallel

scanning? The answer is as follows: if we optimize a single parameter, then our parallel

scanning will work fine. The parameter that decides the number of threads is

TotalSegments .

We should choose the optimal value for this parameter. This value can be decided only by

experience, by trying several values and finding out which value suits us better. There are

a few guidelines put forward by AWS, which are available in the DynamoDB documenta-

tion too. We will discuss the same guidelines here.

First and foremost, we will use parallel scanning only if our table size is above 20 GB

(first guideline). So a single worker will perform the scan operation for every 2 GB of

data. This means that the number of workers (which is decided by and is same as that of

Search WWH ::

Custom Search

Home