Database Reference
In-Depth Information
developed into a robust ecosystem for storing and processing large amounts
of data. A number of tools have been built on top of Hadoop to make it
easier to use; a number of companies have been formed to help customers
use Hadoop, and a lot of work has gone into improving its performance.
Several BigQuery customers have migrated from Hadoop to BigQuery to
perform faster, interactive queries over their data. They still, often, have
a significant amount of “business logic” in their Hadoop pipelines,
transforming the raw data, anonymizing it, cleaning it, and so on. For these
customers, BigQuery doesn't replace Hadoop; it complements it.
You can, of course, run your Hadoop cluster anywhere: on premise, on
Amazon Elastic Computer Cloud (EC2), and so on. But given that you are
MapReducing over data that will either come from or go to BigQuery, it will
usually be more efficient to perform the computation near where the data
is stored to minimize having to copy it across the Internet (which can be
expensive in terms of both time and money). For that reason, we discuss
running your Hadoop cluster on Google Compute Engine.
Hadoop on Google Compute Engine (GCE)
You don't need any additional support to run Hadoop on Google Compute
Engine; the GCE API is quite robust and it isn't hard to write a script
to manage a cluster to run your Hadoop jobs. There are even third-party
companies, such as Qubole, that have built products out of Hadoop cluster
management. Hadoop is a key part of Cloud Computing, and the
performance of Hadoop on GCE is a key differentiator for Google's Cloud.
To make it easier to run Hadoop on GCE, Google has released special
cluster-management tools and data connectors to talk to Google data
sources and sinks.
These Hadoop-on-GCE tools are relatively new at the time of
publication—so far, they are merely a set of scripts that can create and
manage a Hadoop cluster for you. They also include connectors to allow
you to access data in AppEngine Datastore, Google Cloud Storage, and
BigQuery. Because at the time of this writing these tools have not yet been
released to the public, we don't provide a walkthrough of how to use them,
other than to mention that they will be available, and if you are a Hadoop
user, options exist for running Hadoop over BigQuery data. For more
information,
see
https://developers.google.com/hadoop/
bigquery-connector
Search WWH ::




Custom Search