Database Reference
In-Depth Information
he 1585
for 1614
at 1621
his 1795
had 1839
it 1918
my 1921
as 1950
with 2563
that 2726
was 3119
I 4532
in 5149
a 5649
to 6230
and 7826
of 10538
the 18128
This is an SQL-like example that uses HiveQL to run a word count. It is easy to install and use, and Hive
provides a powerful HiveQL interface to the data. Not including the COUNT(*) line, the word-count job took just three
statements. Each of the statements issued to the Hive CLI was passed on to Hadoop as a Map Reduce task.
Whether you employ Hive for Map Reduce will depend on the data you are using, its type, and the relationships
between it and other data streams you might wish to incorporate. You have to view your use of Hive QL in terms of
your ETL chains—that is, the sequence of steps that will transform your data. You might find that Hive QL doesn't offer
the functionality to process your data; in that case, you would choose either Pig Latin or Java.
Map Reduce with Perl
Additionally, you can use the library-based data streaming functionality provided with Hadoop. The important point
to note is that this approach allows you to process streams of data using Hadoop libraries. With the Hadoop streaming
functionality, you can create Map Reduce jobs from many executable scripts, including Perl, Python, and Bash. It is
best used for textual data, as it allows data streaming between Hadoop and external systems.
In this example, you will run a Perl-based word-count task. (I present this example in Perl simply because I am
familiar with that language.) The streaming library can be found within the Hadoop release as a jar file. This is the
library within Hadoop that provides the functionality for users to write their own scripts and have Hadoop use them
for Map Reduce:
[hadoop@hc1nn hadoop]$ pwd
/usr/local/hadoop
[hadoop@hc1nn hadoop]$ ls -l contrib/streaming/hadoop-*streaming*.jar
-rw-rw-r--. 1 hadoop hadoop 107399 Jul 23 2013 contrib/streaming/hadoop-streaming-1.2.1.jar
First, we need a Perl working directory on HDFS called /user/hadoop/perl which will be used for result data for
the Map Reduce run:
[hadoop@hc1nn python]$ hadoop dfs -mkdir /user/hadoop/perl
 
Search WWH ::




Custom Search