Database Reference
In-Depth Information
NOTE
A preprocessed file called MovieRatings.csv is supplied and can be
downloaded with the Chapter 12 materials from
http://www.wiley.com/go/microsoftbigdatasolutions .
After you have your source data file, you must upload it to your HDInsight
cluster. To upload the data file to your cluster and add it to HDFS, follow
these steps:
1. Connect to the head node of your HDInsight cluster using Remote
Desktop. If Remote Desktop has not been enabled, enable it through the
Configuration page and set up an account for Remote Access.
2. Copy and paste the MovieRatings.csv file from your local machine
to the c:\temp folder on the head node of your HDInsight cluster. If
the c:\temp folder does not exist, you will need to create it first.
3. On the desktop, double-click the Hadoop command prompt to open the
Hadoop command line.
4. We will use the Hadoop File System shell to copy the data file from the
local file system to the Hadoop Distributed File System (HDFS). Enter
the following command at the command prompt:
hadoop fs -copyFromLocal c:\temp\MovieRatings.csv
/user/<YOUR USERNAME>/chapter15/input/
MovieRatings.csv
5. After the command completes, use the following command to verify that
the MovieRatings.csv file now exists within HDFS:
hadoop fs -ls /user/<YOUR USERNAME>/chapter15/input/
The data input required to run recommendation jobs using Mahout is now
available and ready for processing on HDInsight.
Search WWH ::




Custom Search