Processing Data with Map Reduce - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

I have already created a number of scripts in the Linux file system directory /home/hadoop/perl:

[hadoop@hc1nn perl]$ ls

mapper.pl test1.sh wc_clean.sh wordcount.sh

reducer.pl test2.sh wc_output.sh

The file names ending in .pl are Perl scripts, while those ending in .sh are shell scripts used either to test the perl

scripts or to run them. The Map function is in the file mapper.pl and looks like this:

[hadoop@hc1nn perl]$ cat mapper.pl

01 #!/usr/bin/perl

02

03 my $line;

04 my @words = ();

05 my $word;

05

06 # process input line by line

07

08 foreach $line ( <STDIN> )

09 {

10 # strip new line from string

11

12 chomp( $line );

13

14 # strip line into words using space

15

16 @words = split( ' ', $line );

17

18 # now print the name value pairs

19

20 foreach $word (@words)

21 {

22 # convert word to lower case

23

24 $word = lc( $word ) ;

25

26 # remove unwanted characters from string

27

28 $word =~ s/!//g ; # remove ! character from word

29 $word =~ s/"//g ; # remove " character from word

30 $word =~ s/'//g ; # remove ' character from word

31 $word =~ s/_//g ; # remove _ character from word

32 $word =~ s/;//g ; # remove ; character from word

33 $word =~ s/\(//g ; # remove ( character from word

34 $word =~ s/\)//g ; # remove ) character from word

35 $word =~ s/\#//g ; # remove # character from word

36 $word =~ s/\$//g ; # remove $ character from word

37 $word =~ s/\&//g ; # remove & character from word

38 $word =~ s/\.//g ; # remove . character from word

39 $word =~ s/\,//g ; # remove , character from word

40 $word =~ s/\*//g ; # remove * character from word

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Search WWH ::

Custom Search

Home