Database Reference
In-Depth Information
Windows azure mSDn documentation has a sample C# wordcount program that implements both the mapper
and reducer classes: http://www.windowsazure.com/en-us/documentation/articles/hdinsight-sample-csharp-
streaming/ .
Note
Once the Mapper and Reducer classes are defined, you need to implement the HadoopJob class. This consists of
the configuration information for your job—for example, the input data and the output folder path. Listing 5-4 shows
the code snippet for the SquareRootJob class implementation.
Listing 5-4. SquareRootJob.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.Hadoop.MapReduce;
namespace HadoopClient
{
class SquareRootJob: HadoopJob<SquareRootMapper>
{
public override HadoopJobConfiguration Configure(ExecutorContext context)
{
var config = new HadoopJobConfiguration
{
InputPath = Constants.wasbPath + "/example/data/Numbers.txt",
OutputFolder = Constants.wasbPath + "/example/data/SqaureRootOutput"
};
return config;
}
}
}
i chose \example\data as the input path where i would have my source file, Numbers.txt . the output will be
generated in the \example\data\SquareRootOutput folder. this output folder will be overwritten each time the job runs.
if you want to preserve an existing job output folder, make sure to change the output folder name each time before job
execution.
Note
Per the configuration option specified in the job class, you need to upload the input file Numbers.txt and the job
will write the output data to a folder called SquareRootOutput in Windows Azure Storage Blob (WASB). This will be
the \example\data directory of the democlustercontainer in the democluster storage account as specified by the
constant wasbPath in the Constants.cs class.
 
 
Search WWH ::




Custom Search