Database Reference
In-Depth Information
Table 4.2 WebHDFS Access Commands
File system
Command
WebHDFS Equivalent
mkdir
PUT "http://<HOST>:<PORT>/
<PATH>?op=MKDIRS"
rm
DELETE "http://<host>:<port>/webhdfs/v1/
<path>?op=DELETE"
ls
"http://<HOST>:<PORT>/webhdfs/v1/
<PATH>?op=LISTSTATUS"
Now that you are familiar with the basic concepts behind HDFS, let's look at
some of the other functionality that is built on top of HDFS.
Exploring Hive: The Hadoop Data Warehouse
Platform
Within the Hadoop ecosystem, HDFS can load and store massive quantities
of data in an efficient and reliable manner. It can also serve that same data
back up to client applications, such as MapReduce jobs, for processing and
data analysis.
Although this is a productive and workable paradigm with a developer's
background, it doesn't do much for an analyst or data scientist trying to sort
through potentially large sets of data, as was the case with Facebook.
Hive, often considered the Hadoop data warehouse platform, got its start at
Facebook as their analyst struggled to deal with the massive quantities of
data produced by the social network. Requiring analysts to learn and write
MapReduce jobs was neither productive nor practical.
Instead, Facebook developed a data warehouse-like layer of abstraction that
would be based on tables. The tables function merely as metadata, and
the table schema is projected onto the data, instead of actually moving
potentially massive sets of data. This new capability allowed their analyst
to use a SQL-like language called Hive Query Language (HQL) to query
massive data sets stored with HDFS and to perform both simple and
sophisticated summarizations and data analysis.
 
Search WWH ::




Custom Search