Database Reference
In-Depth Information
# First start all of the reader threads.
for index in range(len(partition_readers)):
partition_readers[index].start(gcs_objects[index])
# Wait for all of the reader threads to complete.
for index in range(len(partition_readers)):
partition_readers[index].wait_for_complete()
This code has two main parts: the
PartitionReader
class that downloads
files when they become available and the
run_partitioned_extract_job
method that launches the partitioned
read and waits for it to complete. The
PartitionedReader
extends
Python's
Thread
object and periodically polls for new files to be available.
Once the BigQuery job is complete, no more new files will arrive, so it exits
after reading all remaining files.
The
run_partitioned_extract_job
method starts the job, and then
starts all of the
PartitionedReader
threads to read and download the
files in parallel. It returns after waiting for all of those threads to complete.
The
following
code
snippet
shows
how
to
use
the
to
download
the
run_partitioned_extract_job
publicdata:samples.shakespeare
table in three partitions:
$ python
>>>
from extract_and_partitioned_read import
run_partitioned_extract_job
>>>
from job_runner import JobRunner
>>>
from gcs_reader import GcsReader
>>>
project_id='bigquery-e2e'
>>>
gcs_bucket='bigquery-e2e'
>>>
run_partitioned_extract_job(
…
JobRunner(project_id=project_id)
,
…
[GcsReader(gcs_bucket=gcs_bucket
,
…
download_dir='/tmp/bigquery') for x
in range(3)]
,
…
source_project_id='publicdata'
,
…
source_dataset_id='samples'
,
…
source_table_id='shakespeare')
[0] STARTING on gs://bigquery-e2e/