Database Reference
In-Depth Information
performing is idempotent (repeated application does not affect the state)
or it is acceptable that only one of them succeeds, it is possible to avoid
explicit coordination for ID selection and instead rely on write dispositions
to produce the desired behavior. Previously you saw an example with
WRITE_EMPTY , which ensures that only one job ends up modifying the
table. Another example is the use of CREATE_NEVER with jobs that create
the table. Again only one job can successfully create the table. If
WRITE_TRUNCATE is used by multiple jobs updating the same table, the
table usually contains the data loaded by the most recent job. BigQuery does
not guarantee the order of completion for concurrently running jobs, but
generally they complete in the order they were submitted if they are doing
a similar amount of processing. Creative use of these dispositions is covered
in more detail in Chapter 11, “Managing Data Stored in BigQuery,” which
deals with strategies for managing data in BigQuery.
Data Formats
You learned how to transfer your data to BigQuery and how to control
where the data ends up. This section describes how the service interprets the
data you transfer. Currently BigQuery supports three different data formats:
CSV, newline-delimited JSON, and AppEngine Datastore backups. CSV is
more like a family of related formats, whereas the other two formats are
more strictly specified.
Generally, the choice of format is determined by the application that is
producing your data or where you have it stored. However, not all the
formats support the full range of types and modes available in BigQuery
schemas. In this case the schema imposes an additional constraint on
suitable formats.
CSV
It is a bit generous to say the CSV format was designed. There is a
specification of the format available ( http://tools.ietf.org/html/
rfc4180 ) but in practice a lot of CSV encountered in the wild is actually
some variant of this basic standard. However, for better or worse it is the de
facto standard for data interchange and is supported by almost every data
processing tool. BigQuery supports the basic format and has a number of
flags to adjust parsing so that it can support common variants.
Search WWH ::




Custom Search