Database Reference
In-Depth Information
threads.append(read_thread)
threads[index].start()
for index in range(partition_count):
threads[index].join()
Extract Jobs Versus TableData.list() for
Reading Data in Parallel
Both Extract jobs and TableData.list() let you read data from
tables in parallel. When should you use one versus the other? The
answer, unsurprisingly, depends on how you want to read the data. If
you want to read the table like a file—that is, read 1 k bytes at a
time—you will likely want to use the output of an Extract job. Extract
produces files that live in Google Cloud Storage (GCS) that you can read
multiple times and in any byte range you choose. You can download the
files using standard HTTP resumable download operations.
TableData.list() , however, lets you read a specific number of rows
but doesn't give you control over bytes. To read all the data, you need to
use a page token to fetch the next section of data. This means that you
can't just plug it in as-is to download your tables.
There are latency trade-offs as well. Extract jobs require you to wait for
the data to be produced, but when it is ready, you can download at the
speed of your Internet connection. TableData.list() , however, lets
you read data immediately, but the effective bandwidth will be lower
because the data has to be transcoded into your desired format
on-the-fly.
AppEngine MapReduce
There are a number of reasons you might want to extract data from
BigQuery. One common case is when a certain data transformation cannot
be expressed as a query within the service. For instance, it could be any
combination of the following:
Search WWH ::




Custom Search