Database Reference
In-Depth Information
lowercase. The LOWER() function operates only on a single row at a time. In
Query #2, however, you have COUNT() and SUM() . These are aggregation
functions, which operate over a field in multiple rows at once.
If you take a query without a GROUP BY and add an aggregation function
to the SELECT line, you'll get only a single result; that result will be the
value of the aggregation function applied over the entire table. Note there
are rules on mixing aggregation functions with nonaggregations: You either
need aggregations everywhere or nowhere. If you think about this, it makes
sense: If you compute the sum over a particular field, you want one result,
but if you return the field as well, you want multiple results. Because you
can't satisfy both of these constraints at once, this situation is disallowed.
You may notice, however, that we did have a mixture of aggregation
functions and raw fields in the query; we are selecting word as well as
SUM() and COUNT() . Why is this legal? You can legally perform this
operation because of the GROUP BY clause: GROUP BY word . This causes
the query engine to partition the table into buckets—one for each value of
the word field. The aggregation functions then get applied to each of the
buckets, outputting one row per distinct word value.
To give a concrete example, consider the word “bagpipe,” which occurs twice
in Merchant of Venice , once in King Henry IV part 1 and once in Winter's
Tale . When you group by word , you get a single bucket for “bagpipe.” The
COUNT() operation returns “3” because there are three plays in which it
appears. The SUM(word_count) operation returns 4 since it appears four
times across the three plays. In the query results, then, you'd get one row for
“bagpipe” in the results: {word: bagpipe, corpora: 3, total: 4} .
There is one more line we haven't mentioned yet and doesn't look familiar
from query #1:
HAVING corpora >=2 AND corpora <> total
A HAVING clause is a lot like a WHERE clause; this one filters out any word
that appears only in one corpus or once per corpus. The difference between
a WHERE and HAVING clause is when the filtering is applied. A WHERE clause
filters values in the original table; it gets applied before any aggregation is
done. However, a HAVING clause applies filters after any aggregation, so you
can use the aggregated fields. This HAVING clause uses the corpora and total
Search WWH ::




Custom Search