Database Reference
In-Depth Information
# First start all of the reader threads.
for index in range(len(partition_readers)):
partition_readers[index].start(gcs_objects[index])
# Wait for all of the reader threads to complete.
for index in range(len(partition_readers)):
partition_readers[index].wait_for_complete()
This code has two main parts: the PartitionReader class that downloads
files when they become available and the
run_partitioned_extract_job method that launches the partitioned
read and waits for it to complete. The PartitionedReader extends
Python's Thread object and periodically polls for new files to be available.
Once the BigQuery job is complete, no more new files will arrive, so it exits
after reading all remaining files.
The run_partitioned_extract_job method starts the job, and then
starts all of the PartitionedReader threads to read and download the
files in parallel. It returns after waiting for all of those threads to complete.
The
following
code
snippet
shows
how
to
use
the
to
download
the
run_partitioned_extract_job
publicdata:samples.shakespeare table in three partitions:
$ python
>>> from extract_and_partitioned_read import
run_partitioned_extract_job
>>> from job_runner import JobRunner
>>> from gcs_reader import GcsReader
>>> project_id='bigquery-e2e'
>>> gcs_bucket='bigquery-e2e'
>>> run_partitioned_extract_job(
JobRunner(project_id=project_id) ,
[GcsReader(gcs_bucket=gcs_bucket ,
download_dir='/tmp/bigquery') for x
in range(3)] ,
source_project_id='publicdata' ,
source_dataset_id='samples' ,
source_table_id='shakespeare')
[0] STARTING on gs://bigquery-e2e/
Search WWH ::




Custom Search