Database Reference
In-Depth Information
Figure 3.15 Connecting to the RDP session.
Common Post-setup Tasks
Once you have successfully created your cluster and verified its success, you
should be itching to get some data on it and start kicking the tires. In the
next few steps, we'll load some real data into Hadoop and then check out
a couple of the most useful tools, Hive and Pig. (You'll learn more about
Hive and Pig in Chapter 6, “Adding Structure with Hive,” and Chapter 8,
“Effective Big Data ETL with SSIS, Pig, and Sqoop.”)
Loading Your First Files
Now that you have successfully installed HDP, it is time to get some data
loaded into HDFS so that you can verify the functionality of the system. A
favorite data set for playing around in HDP (and Hive in particular) is an
airline data set that shows all the flight and on-time information for airline
flights within the United States from 1987 to 2008. You can find the original
files at http://stat-computing.org/dataexpo/2009/the-data.html .
Basic File system Operations
The HDFS is available and ready to be loaded with data. Using the file
system command fs , you can list and create directories, read and move
 
Search WWH ::




Custom Search