Database Reference
In-Depth Information
Chapter 6
Loading Data
Before you can begin to slice, dice, and roll up your data in BigQuery, first
you have to get the data into the service. In Chapter 3, “Getting Started with
BigQuery,” you worked through a simplified example of loading data to verify
that billing was correctly enabled on your account. Unfortunately, loading
data is not usually quite so simple. For that example a file hosted in Google
Cloud Storage was available in a format understood by BigQuery, and you
were supplied with a schema that matched the data. When you need to load
your own data into the service, you need to tackle each of these steps. This is
not to imply that loading data is super challenging; rather it is to emphasize
that it is an important part of using the service that is at times overlooked.
There are two distinct pieces to the process of loading data into BigQuery:
• Formatting your data appropriately
• Transferring the data to BigQuery
In most scenarios the data you need to analyze lives in a system you control:
files on your computer, records in a database, or logs from hosted servers,
to name a few. The first task is to extract the data from the systems in a
form that BigQuery can accept. In some cases this is trivial because the data
happens to be in a suitable format such as a CSV file on your machine, but
in other cases it might require some massaging or an extraction (the E in
Extract-Transform-Load) from a database. With installed software you might
be done at this point because the application and data usually reside on the
same machine or network. With cloud services there is an additional step;
the data needs to be shipped to the service. With ever-increasing bandwidth
this is becoming less of an issue, but there are still data volumes at which it
becomes important to plan how you move bytes around.
The aim of this chapter is to give you an in-depth understanding of
BigQuery's capabilities for ingesting data. The material is organized around
the two tasks described above. It may be that your use case allows for
straightforward loading that does not rely on any of the advanced options.
But if this is not the case, you will be equipped to select and implement an
appropriate solution for your data pipeline into BigQuery.
Search WWH ::




Custom Search