Loading Data - Google BigQuery Analytics

Database Reference

In-Depth Information

performing is idempotent (repeated application does not affect the state)

or it is acceptable that only one of them succeeds, it is possible to avoid

explicit coordination for ID selection and instead rely on write dispositions

to produce the desired behavior. Previously you saw an example with

WRITE_EMPTY , which ensures that only one job ends up modifying the

table. Another example is the use of CREATE_NEVER with jobs that create

the table. Again only one job can successfully create the table. If

WRITE_TRUNCATE is used by multiple jobs updating the same table, the

table usually contains the data loaded by the most recent job. BigQuery does

not guarantee the order of completion for concurrently running jobs, but

generally they complete in the order they were submitted if they are doing

a similar amount of processing. Creative use of these dispositions is covered

in more detail in Chapter 11, “Managing Data Stored in BigQuery,” which

deals with strategies for managing data in BigQuery.

Data Formats

You learned how to transfer your data to BigQuery and how to control

where the data ends up. This section describes how the service interprets the

data you transfer. Currently BigQuery supports three different data formats:

CSV, newline-delimited JSON, and AppEngine Datastore backups. CSV is

more like a family of related formats, whereas the other two formats are

more strictly specified.

Generally, the choice of format is determined by the application that is

producing your data or where you have it stored. However, not all the

formats support the full range of types and modes available in BigQuery

schemas. In this case the schema imposes an additional constraint on

suitable formats.

CSV

It is a bit generous to say the CSV format was designed. There is a

specification of the format available ( http://tools.ietf.org/html/

rfc4180 ) but in practice a lot of CSV encountered in the wild is actually

some variant of this basic standard. However, for better or worse it is the de

facto standard for data interchange and is supported by almost every data

processing tool. BigQuery supports the basic format and has a number of

flags to adjust parsing so that it can support common variants.

Search WWH ::

Custom Search

Home