Integrating DynamoDB with Other AWS Components - Mastering DynamoDB

Database Reference

In-Depth Information

Integrating with AWS Redshift

As I keep on saying, this is a data era and every piece of data keeps on telling us

something. Acting on this need, Amazon has introduced Redshift, which is a data ware-

house as a service that allows you to dump your data on cloud with minimum cost. Red-

shift has power query language that allows us to drill TBs and PBs of data in seconds. It

helps users to analyze data cheaper and faster.

Now, you must be wondering how this tool could help someone who has his application

database as DynamoDB. Well, the answer is quite simple, most of the organization tries to

keep their application database size easily controllable. This means that they tend to purge

or archive old/stale data periodically. In such cases, it is good to have a data warehousing

solution in cloud itself. So, you can keep your application live data on DynamoDB and use

Redshift to dump old data to archive and analyze.

Unlike DynamoDB, Redshift is a SQL-based data warehousing tool. It comes with a

powerful SQL query tool, which is giving tough competition to other tools, such as Hive,

Impala, Google Big Query, and Apache Drill. We can simply copy data present on Dy-

namoDB to Redshift and start using it for Business intelligence applications.

Even though both DynamoDB and Redshift are from Amazon, we need to take care of a

few things as these two tools are meant to do two different things. The following are a few

important things one should consider before using Redshift with DynamoDB:

• DynamoDB is schema-less, but Redshift needs pre-defined schema to store data in

an appropriate manner.

• We don't have any null value concept in DynamoDB, so we need to specify how

Redshift should handle attributes with null or empty values.

For example, suppose in an Employee table, we have one item {empid:123,

name:XYZ} and another one {empid:111,name:PQR, post:CEO} . Here,

when we copy this data to Redshift, we have to specify schema such as {empid,

name, post} when creating a table. Also, we need to specify how Redshift

would handle the value for post attribute for the first item.

• Also, DynamoDB table names can be up to 255 characters and can contain dot(.)

and dash(-), whereas Redshift table names can be up to 127 characters only, and it

does not allow dot( . ) or dash( - ) in any table name.

Search WWH ::

Custom Search

Home