Database Reference
In-Depth Information
Execute the HadoopClient project and your console output should display messages similar to the following:
Starting MapReduce job. Log in remotely to your Name Node and check progress from JobTracker portal
with the returned JobID...
File dependencies to include with job:
[Auto-detected] D:\HadoopClient\HadoopClient\bin\Debug\HadoopClient.vshost.exe
[Auto-detected] D:\HadoopClient\HadoopClient\bin\Debug\HadoopClient.exe
[Auto-detected] D:\HadoopClient\HadoopClient\bin\Debug\Microsoft.Hadoop.MapReduce.dll
[Auto-detected] D:\HadoopClient\HadoopClient\bin\Debug\Microsoft.Hadoop.WebClient.dll
[Auto-detected] D:\HadoopClient\HadoopClient\bin\Debug\Newtonsoft.Json.dll
[Auto-detected] D:\HadoopClient\HadoopClient\bin\Debug\Microsoft.Hadoop.Client.dll
Job job_201309161139_003 completed.
i commented out the cluster management method calls in the Main() function because we are focusing on
only the mapreduce job part. also, you may see a message about deleting the output folder if it already exists.
Note
If, for some reason, the required environment variables are not set, you may get an error like the following one
while executing the project, which indicates the environment is not suitable:
Environment Vairable not set: HADOOP_HOME
Environment Vairable not set: Java_HOME
If you encounter such a situation, add the following two lines of code to set the variables at the top of your
DoCustomMapReduce() method:
//This is constant
Environment.SetEnvironmentVariable("HADOOP_HOME", @"c:\hadoop");
//Needs to be Java path of the development machine
Environment.SetEnvironmentVariable("Java_HOME", @"c:\hadoop\jvm");
On successful completion, the job returns the job id. Using that, you can track the details of the job in the Hadoop
MapReduce Status or JobTracker portal by remotely connecting to the NameNode. Figure 5-4 shows the preceding
job's execution history in the JobTracker web application.
Figure 5-4. JobTracker portal
 
 
Search WWH ::




Custom Search