Database Reference
In-Depth Information
41 $word =~ s/\-//g ; # remove - character from word
42 $word =~ s/\///g ; # remove / character from word
43 $word =~ s/\{//g ; # remove { character from word
44 $word =~ s/\}//g ; # remove } character from word
45 $word =~ s/\}//g ; # remove } character from word
46
47 # only print the key,value pair if the key is not
48 # empty
49
50 if ( $word ne "" )
51 {
52 print "$word,1\n" ;
53 }
54
55 }
56
57 }
This script takes text file lines from STDIN at line 8, the Linux standard input stream; breaks the input down into
lines, then into words at line 16; and strips the words of unwanted characters between lines 28 and 45. It then prints a
series of key-value pairs as word,1 at line 52. Look at the Reduce script in the Perl file reducer.pl:
[hadoop@hc1nn perl]$ cat reducer.pl
01 #!/usr/bin/perl
02
03 my $line;
04 my @lineparams = ();
05 my $oldword,$word,$value,$sumval;
06
07 # the reducer is going to receive a key,value pair from stdin and it
08 # will need to sum up the values. It will need to split the name and
09 # value out of the comma separated string.
10
11 $oldword = "" ;
12
13 foreach $line ( <STDIN> )
14 {
15 # strip new line from string
16
17 chomp( $line );
18
19 # split the line into the word and value
20
21 @lineparams = split( '\,', $line );
22
23 $word = $lineparams[0];
24 $value = $lineparams[1];
25
26 # Hadoop sorts the data by value so just sum similar word values
27
 
Search WWH ::




Custom Search