Database Reference
In-Depth Information
| learn'd | 30.0 |
| begg'd | 25.0 |
| offer'd | 22.0 |
| mock'd | 21.0 |
| prevail'd | 20.0 |
| wash'd | 20.0 |
| he'll | 19.75 |
+-----------+----------+
The highest relative frequency ratio between words in a Shakespeare play
and words in modern English is the word “villainy.” In fact, “villainy” is the
only word in the top 10 that isn't a contraction (for example, “wrong'd”). The
Ph.D. dissertation on this subject is left as an exercise for the reader.
Query #4: Subselects
SELECT shakespeare.word AS word,
SUM(shakespeare.word_count / english.cnt) AS
rel_freq,
FROM (
SELECT LOWER(word) AS word,
word_count / 945845 as word_count
FROM [publicdata:samples.shakespeare]
WHERE NOT REGEXP_MATCH(word, '[A-Z]+')
AND NOT word CONTAINS "'"
) AS shakespeare
JOIN (
SELECT LOWER(word) AS word,
count / 121464569 AS cnt
FROM [bigquery-e2e:reference.word_frequency]
) AS english
ON shakespeare.word = english.word
GROUP BY word
ORDER BY rel_freq DESC
LIMIT 100
One problem with query #3 is that the words in the Shakespeare table have
inconsistent capitalization. That is, if the word appeared at the beginning
of a sentence, its first letter is capitalized, but if it appears elsewhere in the
Search WWH ::




Custom Search