Loading Data - Google BigQuery Analytics

Database Reference

In-Depth Information

# No new data so sleep briefly.

time.sleep(0.1)

# Re-position the file at the end of the last

full record.

infile.seek(pos)

def main():

service = auth.build_bq_client()

with open(sys.argv[1], 'a+') as infile:

tail_and_insert(infile,

service.tabledata(),

auth.PROJECT_ID,

'ch06',

'streamed')

if __name__ == '__main__':

main()

It is worth calling attention once again to the key feature of the streaming

insert API—records appear in the table as soon as the request completes.

Usually records are available within 100 ms of the request being initiated.

This enables a number of real-time use cases in applications; so building a

pipeline that utilizes the API is a good investment.

Summary

Data storage is a big part of the BigQuery service, so it has a lot of features

related to loading data. This chapter covered all the methods for moving

your data into the service and highlighted common pitfalls. It discussed

using Google Cloud Storage, the Resumable Upload protocol, and multipart

requests as mechanisms to transfer data into the service. Next, the formats,

CSV, JSON, and Datastore backups that the service currently supports were

covered. Finally, how to use the low latency streaming API for inserting

individual records was explained.

It is useful to be aware of the full range of options because often you are

constrained by the current location of your data and may be able to avoid

complicated transformations if you can use the right combination of

features. In cases in which you build a custom data pipeline, this

information can help you design an effective solution. Hopefully, the task

Search WWH ::

Custom Search

Home