Database Reference
In-Depth Information
Submitting the wordcount MapReduce Job
The .NET SDK for HDInsight also provides simpler ways to execute your existing MapReduce programs or MapReduce
code written in Java. In this section, you will submit and execute the sample wordcount MapReduce job and display
the output from the blob storage.
First, let's add a helper function that will wait and display a status when the MapReduce job is in
progress. This is important because the MapReduce function calls might not be symmetric and you might see
incorrect or intermediate output if you fetch the blob storage when the job execution is in progress. Add the
WaitForJobCompletion() method to your Program.cs file with code as shown in Listing 5-8.
Listing 5-8. The WaitForJobCompletion() method
private static void WaitForJobCompletion(JobCreationResults jobResults, IJobSubmissionClient client)
{
JobDetails jobInProgress = client.GetJob(jobResults.JobId);
while (jobInProgress.StatusCode != JobStatusCode.Completed &&
jobInProgress.StatusCode != JobStatusCode.Failed)
{
jobInProgress = client.GetJob(jobInProgress.JobId);
Thread.Sleep(TimeSpan.FromSeconds(1));
Console.Write(".");
}
}
Then add the DoMapReduce() function in your Program.cs file. This function will have the actual code to submit
the wordcount job.
The first step is to create the job definition and configure the input and output parameters for the job. This is
done using the MapReduceJobCreateParameters class.
// Define the MapReduce job
MapReduceJobCreateParameters mrJobDefinition = new MapReduceJobCreateParameters()
{
JarFile = "wasb:///example/jars/hadoop-examples.jar",
ClassName = "wordcount"
};
mrJobDefinition.Arguments.Add("wasb:///example/data/gutenberg/davinci.txt");
mrJobDefinition.Arguments.Add("wasb:///example/data/WordCountOutput");
The next step, as usual, is to grab the correct certificate credentials based on the thumbprint:
var store = new X509Store();
store.Open(OpenFlags.ReadOnly);
var cert = store.Certificates.Cast<X509Certificate2>().First(item
=> item.Thumbprint == Constants.thumbprint);
var creds = new JobSubmissionCertificateCredential(Constants.subscriptionId,
cert, Constants.clusterName);
Once the credentials are created, it is time to create a JobSubmissionClient object and call the MapReduce job
based on the definition:
// Create a hadoop client to connect to HDInsight
var jobClient = JobSubmissionClientFactory.Connect(creds);
 
Search WWH ::




Custom Search