Database Reference
In-Depth Information
Amazon Redshift
Provider: Amazon under the Amazon Web Services platform
Web Site: http://aws.amazon.com/redshift
Platforms: This tool only runs in the cloud on Amazon Web Services.
Technology Overview: Amazon Redshift is a fully managed cloud-based data warehouse service. You
can think of it as a kind of DropBox for databases ranging from a few hundred gigabytes to a petabyte
or more. The technology Redshift uses is built to make it easy to scale up on hardware as data needs
grow. It also includes compression and a columnar-based design that is best suited for analytics queries
and can support very large data volumes.
Pros: Amazon Redshift is a production-grade tool that delivers on its promises. Getting up and running
with it is easy, and managing it is even easier.
Cons: It is only available in the cloud with no option for on-site installation. Connecting to the Amazon
Redshift via Microsoft tools is a challenge from behind the firewall. Although, most big data SQL tools
that are in the cloud have this same challenge.
Hortonworks Hive
Provider: Hortonworks (partners with Microsoft)
Web Site: http://hortonworks.com
Platforms: This tool is available on Hadoop and can be cloud- or on-premises-based.
Technology Overview: Hive is the open-source SQL implementation on Hadoop. Up to recently it
did not allow real-time SQL queries. However, this has changed with the Stinger initiative, which has
the objective of configuring Hive to allow real-time SQL queries.
Pros: On-premises or cloud installation. Allowing on-premises installation is critical if your organization
has a no-cloud policy.
Cons: As of this writing, Hive is still relatively new and the real-time engine is in Beta release.
Cloudera Impala
Provider: Cloudera
Web Site: www.cloudera.com
Platforms: This tool is a real-time SQL engine that was developed by Cloudera and sits on the
Hadoop platform.
Technology Overview: Impala is a proprietary SQL engine that is designed for analytics purposes
and high scalability. It can read traditional Hadoop file formats and can span to multiple Hadoop
nodes. It uses C++ instead of Java for performance and doesn't translate SQL into MapReduce.
Pros: On-premises or cloud installation. Allowing on-premises installation is critical if your organization
has a no-cloud policy.
 
Search WWH ::




Custom Search