Submitting Jobs to Your HDInsight Cluster - Pro Microsoft HDInsight: Hadoop on Windows

Database Reference

In-Depth Information

■ Windows azure mSDn documentation has a sample C# wordcount program that implements both the mapper

streaming/ .

Note

Once the Mapper and Reducer classes are defined, you need to implement the HadoopJob class. This consists of

the configuration information for your job—for example, the input data and the output folder path. Listing 5-4 shows

the code snippet for the SquareRootJob class implementation.

Listing 5-4. SquareRootJob.cs

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using Microsoft.Hadoop.MapReduce;

namespace HadoopClient

{

class SquareRootJob: HadoopJob<SquareRootMapper>

{

public override HadoopJobConfiguration Configure(ExecutorContext context)

{

var config = new HadoopJobConfiguration

{

InputPath = Constants.wasbPath + "/example/data/Numbers.txt",

OutputFolder = Constants.wasbPath + "/example/data/SqaureRootOutput"

};

return config;

}

■ i chose \example\data as the input path where i would have my source file, Numbers.txt . the output will be

generated in the \example\data\SquareRootOutput folder. this output folder will be overwritten each time the job runs.

if you want to preserve an existing job output folder, make sure to change the output folder name each time before job

execution.

Note

Per the configuration option specified in the job class, you need to upload the input file Numbers.txt and the job

will write the output data to a folder called SquareRootOutput in Windows Azure Storage Blob (WASB). This will be

the \example\data directory of the democlustercontainer in the democluster storage account as specified by the

constant wasbPath in the Constants.cs class.

Search WWH ::

Custom Search

Home