Database Reference
In-Depth Information
Amazon Redshift
Provider:
Amazon under the Amazon Web Services platform
Web Site:
http://aws.amazon.com/redshift
Platforms:
This tool only runs in the cloud on Amazon Web Services.
Technology Overview:
Amazon Redshift is a fully managed cloud-based data warehouse service. You
can think of it as a kind of DropBox for databases ranging from a few hundred gigabytes to a petabyte
or more. The technology Redshift uses is built to make it easy to scale up on hardware as data needs
grow. It also includes compression and a columnar-based design that is best suited for analytics queries
and can support very large data volumes.
Pros:
Amazon Redshift is a production-grade tool that delivers on its promises. Getting up and running
with it is easy, and managing it is even easier.
Cons:
It is only available in the cloud with no option for on-site installation. Connecting to the Amazon
Redshift via Microsoft tools is a challenge from behind the firewall. Although, most big data SQL tools
that are in the cloud have this same challenge.
Hortonworks Hive
Provider:
Hortonworks (partners with Microsoft)
Web Site:
http://hortonworks.com
Platforms:
This tool is available on Hadoop and can be cloud- or on-premises-based.
Technology Overview:
Hive is the open-source SQL implementation on Hadoop. Up to recently it
did not allow real-time SQL queries. However, this has changed with the Stinger initiative, which has
the objective of configuring Hive to allow real-time SQL queries.
Pros:
On-premises or cloud installation. Allowing on-premises installation is critical if your organization
has a no-cloud policy.
Cons:
As of this writing, Hive is still relatively new and the real-time engine is in Beta release.
Cloudera Impala
Provider:
Cloudera
Web Site:
www.cloudera.com
Platforms:
This tool is a real-time SQL engine that was developed by Cloudera and sits on the
Hadoop platform.
Technology Overview:
Impala is a proprietary SQL engine that is designed for analytics purposes
and high scalability. It can read traditional Hadoop file formats and can span to multiple Hadoop
nodes. It uses C++ instead of Java for performance and doesn't translate SQL into MapReduce.
Pros:
On-premises or cloud installation. Allowing on-premises installation is critical if your organization
has a no-cloud policy.
Search WWH ::
Custom Search