Database Reference
In-Depth Information
(1,2)
(2,4)
There is no guarantee which order the rows will be processed in. In particular, when re-
trieving the contents of A using DUMP or STORE , the rows may be written in any order. If
you want to impose an order on the output, you can use the ORDER operator to sort a rela-
tion by one or more fields. The default sort order compares fields of the same type using
the natural ordering, and different types are given an arbitrary, but deterministic, ordering
(a tuple is always “less than” a bag, for example).
The following example sorts A by the first field in ascending order and by the second field
in descending order:
grunt> B = ORDER A BY $0, $1 DESC;
grunt> DUMP B;
(1,2)
(2,4)
(2,3)
Any further processing on a sorted relation is not guaranteed to retain its order. For ex-
ample:
grunt> C = FOREACH B GENERATE *;
Even though relation C has the same contents as relation B , its tuples may be emitted in
any order by a DUMP or a STORE . It is for this reason that it is usual to perform the
ORDER operation just before retrieving the output.
The LIMIT statement is useful for limiting the number of results as a quick-and-dirty way
to get a sample of a relation. (Although random sampling using the SAMPLE operator, or
prototyping with the ILLUSTRATE command, should be preferred for generating more
representative samples of the data.) It can be used immediately after the ORDER statement
to retrieve the first n tuples. Usually, LIMIT will select any n tuples from a relation, but
when used immediately after an ORDER statement, the order is retained (in an exception
to the rule that processing a relation does not retain its order):
grunt> D = LIMIT B 2;
grunt> DUMP D;
(1,2)
(2,4)
If the limit is greater than the number of tuples in the relation, all tuples are returned (so
LIMIT has no effect).
Search WWH ::




Custom Search