Database Reference
In-Depth Information
NOTE
To try the examples in this chapter, you need access to a project with
billing enabled because BigQuery does not allow loading data into
projects without billing. Billing setup is covered in Chapter 3.
Bulk Loads
Broadly, BigQuery has two modes for loading data:
• Batch or bulk loads of a large number of records
• Single record insertions
In practice, a batch could contain a single record, and a single insert request
can contain multiple records. The more meaningful difference is that the
bulk mode is designed to provide high throughput , and the single record
mode is designed for low latency . This is reflected in their performance and
in the costs and quotas associated with these modes. This is covered in detail
in the sections on quotas, but the main point is that bulk loads are the mode
best suited to getting a large amount of data into BigQuery quickly.
In addition to enabling high throughput data transfers into BigQuery, bulk
uploads have an important property; in database terminology they have
Atomic, Consistent, Isolated, and Durable (ACID) semantics. In simple
terms it means that a bulk load operation modifies a table in BigQuery so
that:
• The records loaded become visible in queries at the same time. Another
way of saying the same thing is that a query sees all the records from a
load operation or none of them. ( Atomic )
• Either the operation succeeds and the table is modified appropriately or
the operation fails and the table is left unperturbed. ( Consistent )
• When the operation is reported as successful, all future queries are
guaranteed to observe the data added by the job. ( Durable )
Isolation is not particularly relevant in this case because load jobs are not
read-modify-write operations, so they are not dependent on the existing
contents of a table. This makes them trivially isolated. These properties of
Search WWH ::




Custom Search