External Data Processing - Google BigQuery Analytics

Database Reference

In-Depth Information

result_handler:

self.table_reader.read(result_handler)

This listing is a little bit more detailed than it needs to be, in order to

simplify subsequent listings, which show how to read in parallel threads.

The TableReader class can do index-based or pagination token-based

reading, and will add a snapshot time to the table ID so that the listing

is based on a stable snapshot of the table. This listing also handles errors,

which is important if you want to be able to reliably read a large number

of pages from a table. Finally, there is a TableReadThread class that is

used in Listings 12.5 and 12.6 in order to spin up a separate thread to read

a table or portion of a table. The following example uses a TableReader

to read the publicdata:samples.shakespeare table in a background

TableReadThread and saves the results to file.

$ python

>>> from table_reader import TableReader

>>> from table_reader import TableReadThread

>>> output_file_name = '/tmp/bigquery/shakespeare'

>>> table_reader = TableReader(project_id='publicdata' ,

… dataset_id='samples' ,

… table_id='shakespeare')

>>> thread = TableReadThread(table_reader,

output_file_name)

>>> thread.start()

Writing results to /tmp/bigquery/shakespeare

>>> thread.join()

Read 65536 rows from start

Read 65536 rows at CIDBB777777QOGQIBCAIABAQQCAAI===

Read 33584 rows at CIDBB777777QOGQIBCAIACAQQCAAI===

[max 65536]

Table Decorators

BigQuery provides a mechanism called table decorators that can solve many

of the problems encountered when using TableData.list() to read a

table in parallel. Decorators can be used anywhere you otherwise would read

from a table: in a Query, Copy, or Extract job, or in a TableData.list()

operation. Chapter 11, “Managing Data Stored in BigQuery,” shows some

Search WWH ::

Custom Search

Home