Database Reference
In-Depth Information
Exporting data from DynamoDB
When integrating with Redshift, we have to simply copy data from a DynamoDB table to a
Redshift table. Unlike in EMR, we need a connection from DynamoDB to Redshift at the
time of data copy only. There are advantages as well as disadvantages with this approach.
The good thing is that once the copying is done, Redshift processes would not use any Dy-
namoDB provisioned throughput, and a not-so-good thing is that we have to keep two cop-
ies of data one on DynamoDB and another one on Redshift.
Amazon Redshift has a powerful COPY command that can fetch data from DynamoDB
tables faster using massive parallel processing ( MPP ). MPP allows Redshift processes to
distribute the load and fetch data in parallel and in a faster manner. One thing we have to
note over here is the COPY command leverages provisioned throughput of the DynamoDB
table, so we have to make sure enough throughput is provisioned in order to avoid provi-
sioned throughput exceeded exception.
Tip
It is recommended not to use production DynamoDB tables to directly copy data to Red-
shift. As I had mentioned earlier, Redshift's MPP may drain out all the read provisioned
throughput, and if there are any important requests coming to the DynamoDB table from
production, then it may cause some disturbance to the application. To avoid this, either you
can create a duplicate table in DynamoDB, which is copy of the original table, and then use
the COPY command on this table, or you can limit the READRATIO parameter to use only
limited resources.
READRATIO is a parameter in the COPY command that sets how much Redshift should
use from DynamoDB's provisioned throughput. If you want Redshift to utilize full provi-
sioned throughput from the DynamoDB table, then you can set the value of this parameter
to 100.
The COPY command works in the following manner:
• First, it matches the attributes in the DynamoDB table with columns in the Red-
shift table.
• The Redshift table matches the DynamoDB attributes in a sensitive manner.
• The columns in Redshift that do not match to any attribute in DynamoDB are set
as NULL or empty , depending on the value specified in EMPTYASNULL option in
the COPY command.
Search WWH ::




Custom Search