Database Reference
In-Depth Information
}
});
assertEquals
(
"{(5,apple),(6,banana;cherry)}"
,
dump
(
d
));
NOTE
String concatenation is not commutative, so the result is not deterministic. This may or may not be im-
portant in your application!
The code is cluttered somewhat by the use of
Pair
objects in the
process()
method
signature; they have to be unwrapped with calls to
first()
and
second()
, and a new
Pair
object is created to emit the new key-value pair. This combining function does not
alter the key, so we can use an overloaded form of
combineValues()
that takes an
Aggregator
object for operating only on the values and passes the keys through un-
changed. Even better, we can use a built-in
Aggregator
implementation for performing
string concatenation found in the
Aggregators
class. The code becomes:
PTable
<
Integer
,
String
>
e
=
c
.
combineValues
(
Aggregators
.
STRING_CONCAT
(
";"
,
false
));
assertEquals
(
"{(5,apple),(6,banana;cherry)}"
,
dump
(
e
));
Sometimes you may want to aggregate the values in a
PGroupedTable
and return a
result with a different type from the values being grouped. This can be achieved using the
mapValues()
method with a
MapFn
for converting the iterable collection into another
object. For example, the following calculates the number of values for each key:
PTable
<
Integer
,
Integer
>
f
=
c
.
mapValues
(
new
MapFn
<
Iterable
<
String
>,
Integer
>() {
@Override
public
Integer
map
(
Iterable
<
String
>
input
) {
return
Iterables
.
size
(
input
);
}
},
ints
());
assertEquals
(
"{(5,1),(6,2)}"
,
dump
(
f
));
Notice that the values are strings, but the result of applying the map function is an integer,
the size of the iterable collection computed using Guava's
Iterables
class.
You might wonder why the
combineValues()
operation exists at all, given that the
mapValues()
method is more powerful. The reason is that
combineValues()
can
be run as a MapReduce combiner, and therefore it can improve performance by being run
on the map side, which has the effect of reducing the amount of data that has to be trans-