Database Reference
In-Depth Information
Detailed descriptions of all the core Hadoop commands are beyond the scope of this topic. If you are interested,
you can refer to Apache's user manual on Hadoop commands for a complete listing and description at
http://hadoop.apache.org/docs/r1.0.4/commands_manual.html .
A very important thing to re-iterate here is that the HDInsight Service actually simulates the HDFS behaviors
for the end user. Actually, all the cluster data is stored in Windows Azure Storage Blob (WASB) in cluster-specific
containers. If you remember, the core-site.xml file must have the entry of the Azure storage account and the account
key to access the Azure blobs and function correctly. Here is the snippet of our cluster's core-site.xml , which uses
the democluster blob as its cluster storage:
<property>
<name>fs.azure.account.key.democluster.blob.core.windows.net</name>
<value>********************************************************************** </value>
</property>
So the output folder and the file you just created is actually on your blob container for democluster . To confirm
this, you can go to your Windows Azure Management Portal and see the blobs you just created as part of your cluster's
data, as shown in Figure 6-9 .
Figure 6-9. WASB container for democluster
The Hive Console
Hive is an abstraction over HDFS and MapReduce. It enables you to define a table-like schema structure on the
underlying HDFS (actually, WASB in HDInsight), and it provides a SQL-like query language to read data from the
tables. The Hadoop Command Line also gives you access to the Hive console, from which you can directly execute
the Hive Query Language (HQL) , to create, select, join, sort, and perform many other operations with the cluster data.
Internally, the HQL queries are broken down to MapReduce jobs that execute and generate the desired output that
is returned to the user. To launch the Hive console, navigate to the c:\apps\dist\hive-0.11.0.1.3.1.0-06\bin\
folder from the Hadoop Command Line and execute the Hive command. This should start the Hive command prompt
as shown in Listing 6-6.
Listing 6-6. The Hive console
c:\apps\dist\hive-0.11.0.1.3.1.0-06\bin>hive
Logging initialized using configuration in file:/C:/apps/dist/hive-0.11.0.1.3.1.
0-06/conf/hive-log4j.properties
hive>
If you run the show tables command, it will show you similar output as you saw when you ran your Hive job
from the .NET program in Chapter 5 as in Listing 6-7.
 
Search WWH ::




Custom Search