Database Reference
In-Depth Information
Integrating with AWS Redshift
As I keep on saying, this is a data era and every piece of data keeps on telling us
something. Acting on this need, Amazon has introduced Redshift, which is a data ware-
house as a service that allows you to dump your data on cloud with minimum cost. Red-
shift has power query language that allows us to drill TBs and PBs of data in seconds. It
helps users to analyze data cheaper and faster.
Now, you must be wondering how this tool could help someone who has his application
database as DynamoDB. Well, the answer is quite simple, most of the organization tries to
keep their application database size easily controllable. This means that they tend to purge
or archive old/stale data periodically. In such cases, it is good to have a data warehousing
solution in cloud itself. So, you can keep your application live data on DynamoDB and use
Redshift to dump old data to archive and analyze.
Unlike DynamoDB, Redshift is a SQL-based data warehousing tool. It comes with a
powerful SQL query tool, which is giving tough competition to other tools, such as Hive,
Impala, Google Big Query, and Apache Drill. We can simply copy data present on Dy-
namoDB to Redshift and start using it for Business intelligence applications.
Even though both DynamoDB and Redshift are from Amazon, we need to take care of a
few things as these two tools are meant to do two different things. The following are a few
important things one should consider before using Redshift with DynamoDB:
• DynamoDB is schema-less, but Redshift needs pre-defined schema to store data in
an appropriate manner.
• We don't have any null value concept in DynamoDB, so we need to specify how
Redshift should handle attributes with null or empty values.
For example, suppose in an Employee table, we have one item {empid:123,
name:XYZ} and another one {empid:111,name:PQR, post:CEO} . Here,
when we copy this data to Redshift, we have to specify schema such as {empid,
name, post} when creating a table. Also, we need to specify how Redshift
would handle the value for post attribute for the first item.
• Also, DynamoDB table names can be up to 255 characters and can contain dot(.)
and dash(-), whereas Redshift table names can be up to 127 characters only, and it
does not allow dot( . ) or dash( - ) in any table name.
Search WWH ::




Custom Search