Database Reference
In-Depth Information
The Reduce function looks at all the key-value pairs with a given key and combines
those values from R with those values of S in all possible ways. From each pairing, the
tuple produced has the values from R , the key values, and the values from S .
2.3.8
Grouping and Aggregation by MapReduce
As with the join, we shall discuss the minimal example of grouping and aggregation, where
there is one grouping attribute and one aggregation. Let R ( A, B, C ) be a relation to which
we apply the operator γ A ( B ) ( R ). Map will perform the grouping, while Reduce does the ag-
gregation.
x For each tuple ( a, b, c ) produce the key-value pair ( a, b ).
The Reduce Function Each key a represents a group. Apply the aggregation operator θ to
the list [ b 1 , b 2 , . . . , b n ] of B -values associated with key a . The output is the pair ( a, x ),
where x is the result of applying θ to the list. For example, if θ is SUM, then x = b 1 + b 2 +
· · · + b n , and if θ is MAX, then x is the largest of b 1 , b 2 , . . . , b n .
If there are several grouping attributes, then the key is the list of the values of a tuple for
all these attributes. If there is more than one aggregation, then the Reduce function applies
each of them to the list of values associated with a given key and produces a tuple consist-
ing of the key, including components for all grouping attributes if there is more than one,
followed by the results of each of the aggregations.
2.3.9
Matrix Multiplication
If M is a matrix with element m ij in row i and column j , and N is a matrix with element n jk
in row j and column k , then the product P = MN is the matrix P with element p ik in row i
and column k , where
It is required that the number of columns of M equals the number of rows of N , so the sum
over j makes sense.
We can think of a matrix as a relation with three attributes: the row number, the column
number, and the value in that row and column. Thus, we could view matrix M as a relation
M ( I, J, V ), with tuples ( i, j, m ij ), and we could view matrix N as a relation N ( J, K, W ), with
tuples ( j, k, n jk ). As large matrices are often sparse (mostly 0's), and since we can omit the
tuples for matrix elements that are 0, this relational representation is often a very good one
Search WWH ::




Custom Search