Running Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

fields, which are not fields on the original table—they were computed via

aggregating row values.

Query #3: Joins

SELECT shakespeare.word AS word,

SUM(shakespeare.word_count / english.count) AS

rel_freq,

FROM [publicdata:samples.shakespeare] AS shakespeare

JOIN [bigquery-e2e:reference.word_frequency] AS english

ON shakespeare.word = english.word

GROUP BY word

ORDER BY rel_freq DESC

LIMIT 10

For the third query, we combine Shakespeare data with another data

set—one that tells you overall frequency of words in the English language.

This other data set contains two fields: word , which is the word in question,

and word_frequency , which is the total number of times the word

appeared when scanning a large corpus of English-language documents.

This data came from a free word-frequency list compiled here:

http://invokeit.wordpress.com/frequency-word-lists/ . We

have

downloaded

the

list

and

saved

it

in

a

table

named

bigquery-e2e:reference.word_frequency .

Armed with a list of overall English word frequency, you can now see how

the word frequencies in Shakespeare compare to overall English-language

usage. Query #3 computes the relative frequency between Shakespeare's

usage and English-language usage.

We no longer have to filter based on word length because (we assume)

the high-frequency words in the Shakespeare corpus like “the” will also be

high-frequency in the English-language corpus and won't show up in the top

of the results. If they do, that might be someone's doctoral thesis waiting

to happen: “An analysis of the preponderance of definite article usage in

Shakespeare. What does usage of the word 'the' in the Bard's early plays tell

us about socio-economic conflict in Elizabethan England?”

The way to combine multiple data sets in this way in SQL is to use a JOIN

operation. A JOIN takes two tables and matches every row in the first table

Google BigQuery Analytics

Search WWH ::

Custom Search

Home