Database Reference
In-Depth Information
After filtering any bad rows out of the data, my next move is to build a compound key from the extracted data
fields via the Set Key Values step. This step creates a combined comb_key key value from Fields 2 and 3, separating
the string values with a dash. (Actually, this is created as a user-defined Java expression. It also creates a comb_value
value field with a value of 1, as shown in Figure
10-9
).
Figure 10-9.
Java expression for Set Key Values step of mapper transformation
These values are then passed to the MapReduce Output step, which assigns the comb_key and comb_value
variables to the mapper output variables key and value, as shown in Figure
10-10
.
Figure 10-10.
Output step of mapper transformation
That completes the definition of the
mapper
transformation that creates a key and value from the incoming data.
As described earlier, the
reducer
transformation accepts the incoming key / value pair and sorts it, then groups
by key values, sums the value, and finally outputs the results. Figure
10-11
shows the structure of the reducer
transformation. The Input step is the same as for the mapper transformation, but Figure
10-12
provides greater detail
on the Sort Rows step. Here, I define a single field: the sort key field (at the bottom). I also reduce the value in the sort
size field because I didn't have much data and I wanted to save memory.
Search WWH ::
Custom Search