Database Reference
In-Depth Information
Using the Emulator
Working with the emulator is no different from using the Azure service except for a few nominal changes. Specifically,
if you modified the core-site.xml file to point to your Windows Azure Blob Storage, there are very minimal changes
to your Hadoop commands and MapReduce function calls. You can always use the Hadoop Command Line to
execute your MapReduce jobs. For example, to list the contents of your storage Blob container, you can fire the ls
command as shown in Listing 7-3.
Listing 7-3. Executing the ls command
hadoop fs -ls wasb://democlustercontainer@democluster.blob.core.windows.net/
You can also execute MapReduce jobs using the command line. Listing 7-4 shows you a sample job you can
trigger from the Hadoop command prompt.
Listing 7-4. Using the Hadoop command line
hadoop jar "hadoop-examples.jar" "wordcount""/example/data/gutenberg/davinci.txt"
"/example/data/WordCountOutputEmulator"
You need to have the hadoop-examples.jar file at the root of your Blob container to execute the job
successfully.
Note
As with the Azure service, the recommended way to submit and execute MapReduce jobs is through the .NET
SDK or the PowerShell cmdlets. You can refer to Chapter 5 for such job submission and execution samples; there are
very minor changes, like the cluster name, which is your local machine when you are using the emulator. Listing 7-5
shows a sample PowerShell script you can use for your MapReduce job submissions.
Listing 7-5. MapReduce PowerShell script
$creds = Get-Credential
$cluster = http://localhost:50111
$inputPath = "wasb://democlustercontainer@democluster.blob.core.windows.net/
example/data/gutenberg/davinci.txt"
$outputFolder = "wasb://democlustercontainer@democluster.blob.core.windows.net/
example/data/WordCountOutputEmulatorPS"
$jar = "wasb://democlustercontainer@democluster.blob.core.windows.net/hadoop-examples.jar"
$className = "wordcount"
$hdinsightJob = New-AzureHDInsightMapReduceJobDefinition -JarFile $jar -ClassName $className
-Arguments $inputPath, $outputPath
# Submit the MapReduce job
$wordCountJob = Start-AzureHDInsightJob -Cluster $cluster -JobDefinition
$hdinsightJob -Credential $creds
# Wait for the job to complete
Wait-AzureHDInsightJob -Job $wordCountJob -WaitTimeoutInSeconds 3600 -Credential $creds
 
 
Search WWH ::




Custom Search