Database Reference
In-Depth Information
Using Map Reduce, I want to create a count of manufacturer and model groupings from the data. The first step
is to create two transformations: a mapper and a reducer. The mapper receives the file lines and strips those lines
into fields. It filters out any empty lines and creates a key from the fields I am interested in. Finally, it outputs a single
record per line of the compound key and a value of 1. The reducer receives the output from the mapper, sorts data by
the key, then sums the values by key. It then outputs a sorted list of summed values for each key.
Figure 10-6 shows the structure of the mapper transformation.
Figure 10-6. Input step of mapper transformation
Each Map Reduce transformation must start with a Map Reduce Input and end with a Map Reduce Output.
To set up the sequence, I simply click the Design tab in the Explorer pane of the main PDI interface, then drag the
components from the Design view to the Working pane on the right. To connect the components into a workflow,
I click a component to open a drop-down menu below it, click the rightmost green arrow icon (bordered in red in
Figure 10-6 ), and then drag a workflow to the next component to connect them. The workflow arrow indicates the
direction of flow and shows whether the action is unconditional or if it occurs only when the result is True or False. By
double-clicking a component, such as Map Reduce Input, I can open its configuration as shown in Figure 10-6 . Here,
I can see that the input component has inputs called “key” and “value,” with fields described as “string” from the
HDFS file data.
Double-clicking the Split Fields icon opens its configuration, as shown in Figure 10-7 . This component receives
a file line containing a comma-separated set of file fields that need to be split into separate values in order to be
manipulated. That is what this step does: it splits the string-based value field by using a comma as a separator and it
creates 14 new fields, Field 1 to Field 14.
 
Search WWH ::




Custom Search