Database Reference
In-Depth Information
■
Windows azure mSDn documentation has a sample C# wordcount program that implements both the mapper
and reducer classes:
http://www.windowsazure.com/en-us/documentation/articles/hdinsight-sample-csharp-
Note
Once the Mapper and Reducer classes are defined, you need to implement the
HadoopJob
class. This consists of
the configuration information for your job—for example, the input data and the output folder path. Listing 5-4 shows
the code snippet for the
SquareRootJob
class implementation.
Listing 5-4.
SquareRootJob.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.Hadoop.MapReduce;
namespace HadoopClient
{
class SquareRootJob: HadoopJob<SquareRootMapper>
{
public override HadoopJobConfiguration Configure(ExecutorContext context)
{
var config = new HadoopJobConfiguration
{
InputPath = Constants.wasbPath + "/example/data/Numbers.txt",
OutputFolder = Constants.wasbPath + "/example/data/SqaureRootOutput"
};
return config;
}
}
}
■
i chose
\example\data
as the input path where i would have my source file,
Numbers.txt
. the output will be
generated in the
\example\data\SquareRootOutput
folder. this output folder will be overwritten each time the job runs.
if you want to preserve an existing job output folder, make sure to change the output folder name each time before job
execution.
Note
Per the configuration option specified in the job class, you need to upload the input file
Numbers.txt
and the job
will write the output data to a folder called
SquareRootOutput
in Windows Azure Storage Blob (WASB). This will be
the
\example\data
directory of the
democlustercontainer
in the
democluster
storage account as specified by the
constant
wasbPath
in the
Constants.cs
class.