Database Reference
In-Depth Information
The Reduce function looks at all the key-value pairs with a given key and combines
those values from
R
with those values of
S
in all possible ways. From each pairing, the
tuple produced has the values from
R
, the key values, and the values from
S
.
2.3.8
Grouping and Aggregation by MapReduce
As with the join, we shall discuss the minimal example of grouping and aggregation, where
there is one grouping attribute and one aggregation. Let
R
(
A, B, C
) be a relation to which
we apply the operator
γ
A
,θ
(
B
)
(
R
). Map will perform the grouping, while Reduce does the ag-
gregation.
x
For each tuple (
a, b, c
) produce the key-value pair (
a, b
).
The Reduce Function
Each key
a
represents a group. Apply the aggregation operator
θ
to
the list [
b
1
, b
2
, . . . , b
n
] of
B
-values associated with key
a
. The output is the pair (
a, x
),
where
x
is the result of applying
θ
to the list. For example, if
θ
is SUM, then
x
=
b
1
+
b
2
+
· · · +
b
n
, and if
θ
is MAX, then
x
is the largest of
b
1
,
b
2
, . . . , b
n
.
If there are several grouping attributes, then the key is the list of the values of a tuple for
all these attributes. If there is more than one aggregation, then the Reduce function applies
each of them to the list of values associated with a given key and produces a tuple consist-
ing of the key, including components for all grouping attributes if there is more than one,
followed by the results of each of the aggregations.
2.3.9
Matrix Multiplication
If
M
is a matrix with element
m
ij
in row
i
and column
j
, and
N
is a matrix with element
n
jk
in row
j
and column
k
, then the product
P
=
MN
is the matrix
P
with element
p
ik
in row
i
and column
k
, where
It is required that the number of columns of
M
equals the number of rows of
N
, so the sum
over
j
makes sense.
We can think of a matrix as a relation with three attributes: the row number, the column
number, and the value in that row and column. Thus, we could view matrix
M
as a relation
M
(
I, J, V
), with tuples (
i, j, m
ij
), and we could view matrix
N
as a relation
N
(
J, K, W
), with
tuples (
j, k, n
jk
). As large matrices are often sparse (mostly 0's), and since we can omit the
tuples for matrix elements that are 0, this relational representation is often a very good one