Databases Reference
In-Depth Information
each pairing, the tuple produced has the values from R, the key values, and the
values from S.
2.3.9 Grouping and Aggregation by Map-Reduce
As with the join, we shall discuss the minimal example of grouping and aggrega-
tion, where there is one grouping attribute and one aggregation. Let R(A, B, C)
be a relation to which we apply the operator γ A,θ(B) (R). Map will perform the
grouping, while Reduce does the aggregation.
The Map Function: For each tuple (a, b, c) produce the key-value pair (a, b).
The Reduce Function: Each key a represents a group. Apply the aggregation
operator θ to the list [b 1 , b 2 , . . . , b n ] of B-values associated with key a. The
output is the pair (a, x), where x is the result of applying θ to the list. For
example, if θ is SUM, then x = b 1 + b 2 ++ b n , and if θ is MAX, then x is
the largest of b 1 , b 2 , . . . , b n .
If there are several grouping attributes, then the key is the list of the values
of a tuple for all these attributes. If there is more than one aggregation, then
the Reduce function applies each of them to the list of values associated with
a given key and produces a tuple consisting of the key, including components
for all grouping attributes if there is more than one, followed by the results of
each of the aggregations.
2.3.10 Matrix Multiplication
If M is a matrix with element m ij in row i and column j, and N is a matrix
with element n jk in row j and column k, then the product P = M N is the
matrix P with element p ik in row i and column k, where
p ik =
m ij n jk
j
It is required that the number of columns of M equals the number of rows of
N , so the sum over j makes sense.
We can think of a matrix as a relation with three attributes: the row number,
the column number, and the value in that row and column. Thus, we could
view matrix M as a relation M (I, J, V ), with tuples (i, j, m ij ) and we could view
matrix N as a relation N (J, K, W ), with tuples (j, k, n jk ). As large matrices are
often sparse (mostly 0's), and since we can omit the tuples for matrix elements
that are 0, this relational representation is often a very good one for a large
matrix. However, it is possible that i, j, and k are implicit in the position of a
matrix element in the file that represents it, rather than written explicitly with
the element itself. In that case, the Map function will have to be designed to
construct the I, J , and K components of tuples from the position of the data.
The product M N is almost a natural join followed by grouping and ag-
gregation.
That is, the natural join of M (I, J, V ) and N (J, K, W ), having
Search WWH ::




Custom Search