Processing Data with Map Reduce - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

41 $word =~ s/\-//g ; # remove - character from word

42 $word =~ s/\///g ; # remove / character from word

43 $word =~ s/\{//g ; # remove { character from word

44 $word =~ s/\}//g ; # remove } character from word

45 $word =~ s/\}//g ; # remove } character from word

46

47 # only print the key,value pair if the key is not

48 # empty

49

50 if ( $word ne "" )

51 {

52 print "$word,1\n" ;

53 }

54

55 }

56

57 }

This script takes text file lines from STDIN at line 8, the Linux standard input stream; breaks the input down into

lines, then into words at line 16; and strips the words of unwanted characters between lines 28 and 45. It then prints a

series of key-value pairs as word,1 at line 52. Look at the Reduce script in the Perl file reducer.pl:

[hadoop@hc1nn perl]$ cat reducer.pl

01 #!/usr/bin/perl

02

03 my $line;

04 my @lineparams = ();

05 my $oldword,$word,$value,$sumval;

06

07 # the reducer is going to receive a key,value pair from stdin and it

08 # will need to sum up the values. It will need to split the name and

09 # value out of the comma separated string.

10

11 $oldword = "" ;

12

13 foreach $line ( <STDIN> )

14 {

15 # strip new line from string

16

17 chomp( $line );

18

19 # split the line into the word and value

20

21 @lineparams = split( '\,', $line );

22

23 $word = $lineparams[0];

24 $value = $lineparams[1];

25

26 # Hadoop sorts the data by value so just sum similar word values

27

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Search WWH ::

Custom Search

Home