Exploring the HDInsight Name Node - Pro Microsoft HDInsight: Hadoop on Windows

Database Reference

In-Depth Information

13/12/10 01:37:50 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0

13/12/10 01:37:50 INFO mapred.JobClient: File Output Format Counters

13/12/10 01:37:50 INFO mapred.JobClient: Bytes Written=0

13/12/10 01:37:50 INFO mapred.JobClient: FileSystemCounters

13/12/10 01:37:50 INFO mapred.JobClient: WASB_BYTES_READ=3027416

13/12/10 01:37:50 INFO mapred.JobClient: FILE_BYTES_READ=3696

13/12/10 01:37:50 INFO mapred.JobClient: HDFS_BYTES_READ=792

13/12/10 01:37:50 INFO mapred.JobClient: FILE_BYTES_WRITTEN=296608

13/12/10 01:37:50 INFO mapred.JobClient: File Input Format Counters

13/12/10 01:37:50 INFO mapred.JobClient: Bytes Read=0

13/12/10 01:37:50 INFO mapred.JobClient: Map-Reduce Framework

13/12/10 01:37:50 INFO mapred.JobClient: Map input records=36153

13/12/10 01:37:50 INFO mapred.JobClient: Physical memory (bytes) snapshot=779915264

13/12/10 01:37:50 INFO mapred.JobClient: Spilled Records=0

13/12/10 01:37:50 INFO mapred.JobClient: CPU time spent (ms)=17259

13/12/10 01:37:50 INFO mapred.JobClient: Total committed heap usage (bytes)=2058092544

13/12/10 01:37:50 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2608484352

13/12/10 01:37:50 INFO mapred.JobClient: Map output records=36153

13/12/10 01:37:50 INFO mapred.JobClient: SPLIT_RAW_BYTES=792

13/12/10 01:37:50 INFO mapreduce.ExportJobBase:Transferred 792 bytes in 53.6492

seconds (14.7626 bytes/sec)

13/12/10 01:37:50 INFO mapreduce.ExportJobBase: Exported 36153 records.

As you can see, Sqoop is a pretty handy import/export tool for your cluster's data, allowing you to go easily

to and from a SQL Azure database. Sqoop allows you to merge structured and unstructured data, and to provide

powerful analytics on the data overall. For a complete reference of all the available Sqoop commands, visit the Apache

documentation site at https://cwiki.apache.org/confluence/display/SQOOP/Home .

The Pig Console

Pig is a set-based data transformation tool that works on top of the Hadoop stack to manipulate data sets to add

and remove aggregates, and to transform data. Pig is most analogous to the Dataflow task in SQL Server Integration

Services (SSIS) , as discussed in Chapter 10.

Unlike SSIS, Pig does not have a control-flow system. Pig is written in Java and produces Java .jar code to run

MapReduce jobs across the nodes in the Hadoop cluster to manipulate the data in a distributed way. Pig exposes a

command-line shell called Grunt to execute Pig statements. To launch the Grunt shell, navigate to c:\apps\dist\

pig-0.11.0.1.3.1.0-06\bin directory from the Hadoop Command Line. Then execute the Pig command. That

should launch the Grunt shell as shown in Listing 6-12.

Listing 6-12. Launching the Pig Grunt shell

c:\apps\dist\pig-0.11.0.1.3.1.0-06\bin>pig

2013 -12-10 01:48:10,150 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.0.1.3.1.0-06

(r: unknown) compiled Oct 02 2013, 21:58:30

2013 -12-10 01:48:10,151 [main] INFO org.apache.pig.Main - Logging error messages to:

C:\apps\dist\hadoop-1.2.0.1.3.1.0-06\logs\pig_1386640090147.log

2013 -12-10 01:48:10,194 [main] INFO org.apache.pig.impl.util.Utils

- Default bootup file D:\Users\hadoopuser/.pigbootup not found

2013 -12-10 01:48:10,513 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine

- Connecting to hadoop file system at: wasb://democlustercontainer@democluster.blob.core.windows.net

Pro Microsoft HDInsight: Hadoop on Windows

Search WWH ::

Custom Search

Home