Database Reference
In-Depth Information
Loading Data
You can feed data to your Hive tables by simply copying data files into the appropriate folders. A table's definition is
purely a metadata schema that is applied to the data files in the folders when they are queried. This makes it easy to
define tables in Hive for data that is generated by other processes and deposited in the appropriate folders when ready.
Additionally, you can use the HiveQL LOAD statement to load data from an existing file into a Hive table. This
statement moves the file from its current location to the folder associated with the table. LOAD does not do any
transformation while loading data into tables. LOAD operations are currently pure copy/move operations that move
data files into locations corresponding to Hive tables. This is useful when you need to create a table from the results of
a MapReduce job or Pig script that generates an output file alongside log and status files. The technique enables you
to easily add the output data to a table without having to deal with additional files you do not want to include in the
table.
For example, Listing 8-6 shows how to load data into the analysis_stock table created earlier. You can execute
the following PowerShell script, which will load data from TableMSFT.csv .
Listing 8-6. Loading data to a Hive table
$subscriptionName = "YourSubscriptionName"
$storageAccountName = "democluster"
$containerName = "democlustercontainer"
$clustername = "democluster"
$querystring = "load data inpath 'wasb://democlustercontainer@democluster.blob.core.windows.net/
debarchan/StockData/tableMSFT.csv'
into table stock_analysis partition(exchange ='NASDAQ');"
$HiveJobDefinition = New-AzureHDInsightHiveJobDefinition
-Query $querystring
$HiveJob = Start-AzureHDInsightJob -Subscription
$subscriptionname -Cluster $clustername
-JobDefinition $HiveJobDefinition
$HiveJob | Wait-AzureHDInsightJob -Subscription $subscriptionname
-WaitTimeoutInSeconds 3600
Get-AzureHDInsightJobOutput -Cluster $clustername
-Subscription $subscriptionname
-JobId $HiveJob.JobId -StandardError
You may need to wrap up each of the commands in single line to avoid syntax errors depending on the
powershell editor you use.
Note
You should see output similar to the following once the job completes:
StatusDirectory : 0b2e0a0b-e89b-4f57-9898-3076c10fddc3
ExitCode : 0
Name : Hive: load data inpath 'wa
Query : load data inpath 'wasb://democlustercontainer@democluster.blob.core.windows.net/
debarchan/StockData/tableMSFT.csv' into table stock_analysis
partition(exchange ='NASDAQ');
 
 
Search WWH ::




Custom Search