Database Reference
In-Depth Information
# No new data so sleep briefly.
time.sleep(0.1)
# Re-position the file at the end of the last
full record.
infile.seek(pos)
def main():
service = auth.build_bq_client()
with open(sys.argv[1], 'a+') as infile:
tail_and_insert(infile,
service.tabledata(),
auth.PROJECT_ID,
'ch06',
'streamed')
if __name__ == '__main__':
main()
It is worth calling attention once again to the key feature of the streaming
insert API—records appear in the table as soon as the request completes.
Usually records are available within 100 ms of the request being initiated.
This enables a number of real-time use cases in applications; so building a
pipeline that utilizes the API is a good investment.
Summary
Data storage is a big part of the BigQuery service, so it has a lot of features
related to loading data. This chapter covered all the methods for moving
your data into the service and highlighted common pitfalls. It discussed
using Google Cloud Storage, the Resumable Upload protocol, and multipart
requests as mechanisms to transfer data into the service. Next, the formats,
CSV, JSON, and Datastore backups that the service currently supports were
covered. Finally, how to use the low latency streaming API for inserting
individual records was explained.
It is useful to be aware of the full range of options because often you are
constrained by the current location of your data and may be able to avoid
complicated transformations if you can use the right combination of
features. In cases in which you build a custom data pipeline, this
information can help you design an effective solution. Hopefully, the task
Search WWH ::




Custom Search