Database Reference
In-Depth Information
Querying HDFS
Hadoop ecosystem provides Pig and Hive frameworks to query data from HDFS. In
thelatestversionsofHDunderPivotalendeavor,HAWQframework(SQL-likequery-
ing interface for HD) is being released. We will not be covering HAWQ in this topic.
Let'stakeaquicklookatwhatPigandHiveframeworksareallaboutandunderstand
how HDFS data can be queried using some examples.
Hive
In this section, we will focus on understanding how to use Hive to access data stored
in HDFS. The following figure depicts Hive architecture.
Hive has the following dependencies to run successfully:
• Java 6
• Hadoop framework and Hadoop home directory configured
Hive internally runs in a MapReduce mode for efficiency. Hive is an SQL-like inter-
face that can query data on HDFS.
For example:
1. Passing CSV data onto HDFS using the following commands:
Search WWH ::




Custom Search