Configuring Your First Big Data Environment - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Figure 3.17 Results of Hive query.

As you can see, it provides a great deal of information beyond just the

answer to the query (28, 745, 465). Your answer may differ depending on

which and how many of the flight info files you loaded into HDFS. Hive

also tells us the time it took for the query to complete, how many reads and

writes occurred, and the name of the job used in the process. The name of

the job is vitally important if you need to troubleshoot for errors.

Finally, you can drop the table:

DROP TABLE IF EXISTS flightinfo;

And exit Hive:

exit;

Open up a new Hadoop command prompt and enter the following:

hadoop fs -ls flightinfo

Notice that the data is still there. Hive external tables such as what we

created earlier are simply metadata explanations of the data residing in

HDFS.Youcancreateanddroptableswithoutaffectingtheunderlyingdata;

this is one of the great powers of Hive that you will learn more about in

Chapter 10, “Adding Structure with Hive.”

Verifying Pig

Pig is a procedural scripting language that you will learn more about in

Chapter 8, “Effective Big Data ETL with SSIS, Pig, and Sqoop.” To quickly

test Pig, you are going to run a word-count program that you often see

in MapReduce examples. Any *.txt file will do, but one good example

is the Davinci.txt file available from the examples in the HDInsight

Service Samples directory. You can get to this directory by logging in to the

Windows Azure portal and clicking Manage Cluster. Samples is one of the

Search WWH ::

Custom Search

Home