Using BigQuery from Third-Party Tools - Google BigQuery Analytics

Database Reference

In-Depth Information

installed bq or have not authorized your Google account using it, you should

follow the instructions in Chapter 3, “Getting Started with BigQuery.”

Assuming you have authenticated with bq or the Google Cloud SDK, it is

easy to run a query in pandas:

$ python

>>> from pandas.io import gbq

>>> data_frame = gbq.read_gbq(

'SELECT COUNT(*) FROM

[publicdata:samples.shakespeare]')

Waiting on bqjob_r2f6dcee956cff5bd_0000014460593881_1

… (0s)

Current status: DONE

>>> print "%s" % (data_frame,)

f0_

0 164656

[1 rows x 1 columns]

If the gbq.read_gbq() command works without returning an error, then

you're all set to begin using BigQuery from pandas.

Pandas Example: Clustering Shakespeare

In the R example, we tried to classify Shakespeare texts into genres; whether

they are tragedies, histories, or comedies. To do so, we needed to know

the genre in advance for some of the plays in order to train our machine

learning model. Wouldn't it be nice if we didn't have to know any genres

in advance, but we'd still be able to classify the plays? We could divide up

the plays into the “natural” buckets and go back and see if those buckets

have any real-world meaning. That is, instead of providing classifications at

the beginning, we can provide them after we've already sorted the plays into

buckets. Of course, the buckets may not correspond to genre, but they might

also show some hidden similarity between the plays, or align themselves

in other ways, like early or late plays or even plays that were written by

Shakespeare's evil twin brother.

This type of analysis is called unsupervised learning, and there are a lot

of different algorithms you can use to approach the problem. One of the

standard algorithms is called k-means clustering , which groups unlabeled

Search WWH ::

Custom Search

Home