Database Reference
In-Depth Information
between records that were in separate tables. In the single table case, it is
likely possible to collect the updates from all 10 independent processes and
combine them into a single load operation. In both cases this would work
around the quota constraint without sacrificing the frequency of updates.
Another point to note is that small load jobs are wasteful of the per table
quota. If you load only a handful of records per load job, at the end of the
day, your table will contain only a few thousand records, not exactly Big
Data. The usual reason for issuing small frequent updates is to keep the table
fresh for some real-time data source. Considering that even a small load job
can take anywhere from tens of seconds to a couple of minutes, this is not
an effective way to keep data fresh or utilize your daily quota. This issue
sets you up nicely for the next major section of this chapter. As previously
mentioned, BigQuery supports a throughput optimized load operation and
a latency optimized load operation. When you run into the daily table or
project limits, it may be a signal that you should use the latency optimized
operation. The next section covers this alternative way of loading data into
the service.
Streaming Inserts
If you are familiar with traditional databases, you may wonder why so much
machinery is required to load a couple of records into a table. As discussed
in Chapter 2, “BigQuery Fundamentals,” aspects of the service resemble
a relational database, but at its core BigQuery is a distributed processing
framework optimized for dealing with large amounts of data. As a result, its
primary loading mechanism is geared toward ingesting large quantities of
data rather than individual records. Nevertheless, the service does provide
a simple operation for inserting individual records, referred to here as a
streaming insert. Even though it bears a strong resemblance to the SQL
insert statement, do not be fooled; there are substantial differences. The
API gains it simplicity and low latency by foregoing the strong guarantees
offered by the job-based load operation. Perhaps in contrast to the ACID
properties of load jobs, you can describe this operation as
Eventual-At-least-Once. This means one or more copies of a record inserted
via the streaming API are guaranteed to eventually appear in queries over
the destination table. This may seem like an alarmingly weak promise, but it
is sufficient for a variety of applications. In practice, records inserted via this
API are available immediately and exactly once in queries, which means that
Search WWH ::




Custom Search