Database Reference
In-Depth Information
result_handler:
self.table_reader.read(result_handler)
This listing is a little bit more detailed than it needs to be, in order to
simplify subsequent listings, which show how to read in parallel threads.
The
TableReader
class can do index-based or pagination token-based
reading, and will add a snapshot time to the table ID so that the listing
is based on a stable snapshot of the table. This listing also handles errors,
which is important if you want to be able to reliably read a large number
of pages from a table. Finally, there is a
TableReadThread
class that is
used in Listings 12.5 and 12.6 in order to spin up a separate thread to read
a table or portion of a table. The following example uses a
TableReader
to read the
publicdata:samples.shakespeare
table in a background
TableReadThread
and saves the results to file.
$ python
>>>
from table_reader import TableReader
>>>
from table_reader import TableReadThread
>>>
output_file_name = '/tmp/bigquery/shakespeare'
>>>
table_reader = TableReader(project_id='publicdata'
,
…
dataset_id='samples'
,
…
table_id='shakespeare')
>>>
thread = TableReadThread(table_reader,
output_file_name)
>>>
thread.start()
Writing results to /tmp/bigquery/shakespeare
>>>
thread.join()
Read 65536 rows from start
Read 65536 rows at CIDBB777777QOGQIBCAIABAQQCAAI===
Read 33584 rows at CIDBB777777QOGQIBCAIACAQQCAAI===
[max 65536]
Table Decorators
BigQuery provides a mechanism called table decorators that can solve many
of the problems encountered when using
TableData.list()
to read a
table in parallel. Decorators can be used anywhere you otherwise would read
from a table: in a Query, Copy, or Extract job, or in a
TableData.list()
operation. Chapter 11, “Managing Data Stored in BigQuery,” shows some