Database Reference
In-Depth Information
Figure 2.3 Google Cloud Platform and BigQuery
Google Compute Engine (GCE)
Google Compute Engine is the solution for people who want to control all
aspects of the software stack. You get a standard Linux virtual machine
running in Google's cloud. You can install anything you'd like on that
machine and run it however you choose. You pay for the amount of virtual
hardware you need and the amount of data transferred out of the system.
Because you can run any software you like on virtually any number of
machines, you can run other Big Data analysis and transformation suites.
This can be complementary to BigQuery. For example, you can use the
BigQuery Hadoop connector to read your BigQuery tables, process them
with Hadoop, and write them back to BigQuery. This is a popular option
for customers who want to perform transformations on their data that are
difficult or impossible to do in BigQuery SQL. This is also a good way to do
Extract Transform and Load (ETL) operations to translate source data into
a format that can be easily ingested by BigQuery.
If you prefer, of course, you could run Impala, Presto, or Apache Drill
on your Google Compute Engine instances. We believe that there are
performance and manageability advantages to BigQuery. However,
alternatives do exist, even within the Google cloud. If you choose to use a
non-Google cloud, the most comparable alternative is Amazon EC2.
Chapter 12, “External Data Processing,” briefly describes using Google
Compute Engine to run Hadoop over BigQuery data. The same example
would likely work well in Amazon's cloud; although, the performance might
suffer because the data has to cross the public Internet.
Google Cloud Storage (GCS)
As mentioned, BigQuery is a structured storage system, which is great if
your data is organized into rows and columns. Google Cloud Storage,
 
Search WWH ::




Custom Search