External Data Processing - Google BigQuery Analytics

Database Reference

In-Depth Information

TableReader . Finally, it waits for all of the threads to complete. Following

is an example of running the indexed table reader to read your favorite table,

publicdata:samples.shakespeare in three parallel threads:

$ python

>>> import tabledata_index

>>> tabledata_index.parallel_indexed_read(

… 3, 'publicdata', 'samples', 'shakespeare' ,

… '/tmp/bigquery')

publicdata:samples.shakespeare last modified at

1335916045099

Reading [0-54885)

Writing results to /tmp/bigquery/shakespeare.0

Reading [54885-109770)

Writing results to /tmp/bigquery/shakespeare.1

Reading [109770-164655)

Writing results to /tmp/bigquery/shakespeare.2

Read 54885 rows at 54885

Read 54885 rows at 109770

Read 54885 rows at 0

Time Range Decorators

Another way to split up a table is to use a time range decorator, which allows

you to read only data that was added to a table during a particular time

range, for example:

publicdata:samples.wikipedia@1386465812000-1386465899999

Time range decorators create a view of the table containing only the data

that was added between those two timestamps. Like a snapshot decorator,

the times used in time range decorators must be within the last 7 days.

How is reading only a time slice of data in a table useful when reading out

a table? It is useful because you might not have to read out the whole table.

Maybe you read the table yesterday at time T, so today you need to read only

the data that was added between T and now. If you had to read out the entire

table page by page it might take a long time, but the data that was added in

the last 24 hours might be much more manageable.

Search WWH ::

Custom Search

Home