Running Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

The next line is:

FROM [publicdata:samples.shakespeare]

After the selected field list comes the FROM clause, which instructs the query

engine where to find the data. Fully specified BigQuery table names are

designated by project_id_or_number:dataset_name.table_name .

That said, you often don't need to use the fully specified name. The project

ID defaults to the project that runs the query. If you don't like specifying the

dataset name either and are using the API (as opposed to the query UI), you

can set a default dataset in the job query configuration.

You may have noticed the funny brackets around the table name that aren't

there in standard SQL. These are generally optional, but some table names

require these quote characters to parse correctly. For example, if the table

name was shakespeare-plays rather than shakespeare , the query

parser would have a difficult time telling whether this was a subtraction

operation ( shakespeare minus plays ) or a table name. To prevent

parsing ambiguities, it is usually safest to just include the brackets.

Moving to the next line in the query, you have:

WHERE corpus CONTAINS 'king' AND LENGTH(word) > 5

This line contains the WHERE clause, which enables you to filter which rows

are returned. In this case, the filter returns only rows where the corpus

field contains “king” and the word field is longer than five letters. You can

also call most functions here; the test for words longer than five letters

in the name could have been written as REGEXP_MATCH(word,

'\\w{5}\\w+') .

One of the last pieces of a query is the optional ORDER BY clause, which

enables you to choose the sort order of the query results:

ORDER BY frequency DESC

One trick that can come in handy is to order by the index of the field in the

SELECT clause. In this case, frequency is the second field mentioned, so

this could have been written as ORDER BY 2 DESC .

Search WWH ::

Custom Search

Home