Database Reference
In-Depth Information
The next line is:
FROM [publicdata:samples.shakespeare]
After the selected field list comes the
FROM
clause, which instructs the query
engine where to find the data. Fully specified BigQuery table names are
designated by
project_id_or_number:dataset_name.table_name
.
That said, you often don't need to use the fully specified name. The project
ID defaults to the project that runs the query. If you don't like specifying the
dataset name either and are using the API (as opposed to the query UI), you
can set a default dataset in the job query configuration.
You may have noticed the funny brackets around the table name that aren't
there in standard SQL. These are generally optional, but some table names
require these quote characters to parse correctly. For example, if the table
name was
shakespeare-plays
rather than
shakespeare
, the query
parser would have a difficult time telling whether this was a subtraction
operation (
shakespeare
minus
plays
) or a table name. To prevent
parsing ambiguities, it is usually safest to just include the brackets.
Moving to the next line in the query, you have:
WHERE corpus CONTAINS 'king' AND LENGTH(word) > 5
This line contains the
WHERE
clause, which enables you to filter which rows
are returned. In this case, the filter returns only rows where the
corpus
field contains “king” and the
word
field is longer than five letters. You can
also call most functions here; the test for words longer than five letters
in the name could have been written as
REGEXP_MATCH(word,
'\\w{5}\\w+')
.
One of the last pieces of a query is the optional
ORDER BY
clause, which
enables you to choose the sort order of the query results:
ORDER BY frequency DESC
One trick that can come in handy is to order by the index of the field in the
SELECT
clause. In this case,
frequency
is the second field mentioned, so
this could have been written as
ORDER BY 2 DESC
.