Database Reference
In-Depth Information
reported by JSON load jobs. There is no point retrying invalid inserts,
but connection and other transient errors should be retried. If the request
contains only a couple of records, and includes the insertId fields, it is
reasonable to retry the entire request, but if the request has a large number
of records, it is more efficient to retry only the failed records.
Just as with load jobs, there is a size limit on an individual request and rate
limits on the total number of requests.
Maximum record size : 100 KB
Maximum bytes per request : 1 MB
Table rate limit : 10,000 rows/second (enforced over 10 seconds)
Project rate limit : 100,000 rows/second
Record size and bytes refers to the size computed based on the data in the
records and not the JSON encoded size.
To complete this section now look at how to perform inserts using the
Python client API. Listing 6.2 is a script that accepts a filename as an
argument. It tails (polls for data appearing at the end) the given file and
parses each line, turns it into a record, and then performs an insert. Notice
that it uses the filename and position of the record as the insertId , which
ensures that if the script is restarted on the file, the records will not be
duplicated. This is not perfect because if the script is restarted after the
deduplication window has passed, the records will end up duplicated. Fixing
this behavior is left as an exercise to the reader. Another feature to note
is that the script builds batches of up to 10 records before submitting the
request, but only if the records are immediately available. This usually
increases throughput without delaying the delivery of records.
Listing 6.2 : (stream.py)
def tail_and_insert(infile,
tabledata,
project_id,
dataset_id,
table_id):
'''Tail a file and stream its lines to a BigQuery
table.
infile: file object to be tailed.
 
 
Search WWH ::




Custom Search