Database Reference
In-Depth Information
Figure 3.17 Results of Hive query.
As you can see, it provides a great deal of information beyond just the
answer to the query (28, 745, 465). Your answer may differ depending on
which and how many of the flight info files you loaded into HDFS. Hive
also tells us the time it took for the query to complete, how many reads and
writes occurred, and the name of the job used in the process. The name of
the job is vitally important if you need to troubleshoot for errors.
Finally, you can drop the table:
DROP TABLE IF EXISTS flightinfo;
And exit Hive:
exit;
Open up a new Hadoop command prompt and enter the following:
hadoop fs -ls flightinfo
Notice that the data is still there. Hive external tables such as what we
created earlier are simply metadata explanations of the data residing in
HDFS.Youcancreateanddroptableswithoutaffectingtheunderlyingdata;
this is one of the great powers of Hive that you will learn more about in
Chapter 10, “Adding Structure with Hive.”
Verifying Pig
Pig is a procedural scripting language that you will learn more about in
Chapter 8, “Effective Big Data ETL with SSIS, Pig, and Sqoop.” To quickly
test Pig, you are going to run a word-count program that you often see
in MapReduce examples. Any *.txt file will do, but one good example
is the Davinci.txt file available from the examples in the HDInsight
Service Samples directory. You can get to this directory by logging in to the
Windows Azure portal and clicking Manage Cluster. Samples is one of the
 
Search WWH ::




Custom Search