Database Reference
In-Depth Information
I have already created a number of scripts in the Linux file system directory /home/hadoop/perl:
[hadoop@hc1nn perl]$ ls
mapper.pl test1.sh wc_clean.sh wordcount.sh
reducer.pl test2.sh wc_output.sh
The file names ending in .pl are Perl scripts, while those ending in .sh are shell scripts used either to test the perl
scripts or to run them. The Map function is in the file mapper.pl and looks like this:
[hadoop@hc1nn perl]$ cat mapper.pl
01 #!/usr/bin/perl
02
03 my $line;
04 my @words = ();
05 my $word;
05
06 # process input line by line
07
08 foreach $line ( <STDIN> )
09 {
10 # strip new line from string
11
12 chomp( $line );
13
14 # strip line into words using space
15
16 @words = split( ' ', $line );
17
18 # now print the name value pairs
19
20 foreach $word (@words)
21 {
22 # convert word to lower case
23
24 $word = lc( $word ) ;
25
26 # remove unwanted characters from string
27
28 $word =~ s/!//g ; # remove ! character from word
29 $word =~ s/"//g ; # remove " character from word
30 $word =~ s/'//g ; # remove ' character from word
31 $word =~ s/_//g ; # remove _ character from word
32 $word =~ s/;//g ; # remove ; character from word
33 $word =~ s/\(//g ; # remove ( character from word
34 $word =~ s/\)//g ; # remove ) character from word
35 $word =~ s/\#//g ; # remove # character from word
36 $word =~ s/\$//g ; # remove $ character from word
37 $word =~ s/\&//g ; # remove & character from word
38 $word =~ s/\.//g ; # remove . character from word
39 $word =~ s/\,//g ; # remove , character from word
40 $word =~ s/\*//g ; # remove * character from word
 
Search WWH ::




Custom Search