Databases Reference
In-Depth Information
(0,1,2)
(0,5,2)
(1,3,4)
(1,7,8)
grunt> SPLIT c INTO d IF $0 == 0, e IF $0 == 1;
grunt> DUMP d;
(0,1,2)
(0,5,2)
grunt> DUMP e;
(1,3,4)
(1,7,8)
The UNION operator allows duplicates. You can use the DISTINCT operator to remove
duplicates from a relation. Our SPLIT operation on c sends a tuple to d if its first field
( $0 ) is 0, and to e if it's 1. It's possible to write conditions such that some rows will go to
both d and e or to neither. You can simulate SPLIT by multiple FILTER operators. The
FILTER operator alone trims a relation down to only tuples that pass a certain test:
grunt> f = FILTER c BY $1 > 3;
grunt> DUMP f;
(0,5,2)
(1,7,8)
We've seen LIMIT being used to take a specified number of tuples from a relation.
SAMPLE is an operator that randomly samples tuples in a relation according to a speci-
fied percentage.
The operations 'till now are relatively simple in the sense that they operate on each
tuple as an atomic unit. More complex data processing, on the other hand, will require
working on groups of tuples together. We'll next look at operators for grouping. Unlike
previous operators, these grouping operators will create new schemas in their output
that rely heavily on bags and nested data types. The generated schema may take a little
time to get used to at first. Keep in mind that these grouping operators are almost
always for generating intermediate data. Their complexity is only temporary on your
way to computing the final results.
The simpler of these operators is GROUP . Continuing with the same set of relations
we used earlier,
grunt> g = GROUP c BY $2;
grunt> DUMP g;
(2,{(0,1,2),(0,5,2)})
(4,{(1,3,4)})
(8,{(1,7,8)})
grunt> DESCRIBE c;
c: {a1: int,a2: int,a3: int}
grunt> DESCRIBE g;
g: {group: int,c: {a1: int,a2: int,a3: int}}
We've created a new relation, g , from grouping tuples in c having the same value on
the third column ( $2 , also named a3 ). The output of GROUP always has two fields. The
first field is group key, which is a3 in this case. The second field is a bag containing
 
Search WWH ::




Custom Search