Accessing HDInsight over Hive and ODBC - Pro Microsoft HDInsight: Hadoop on Windows

Database Reference

In-Depth Information

You can use the SKEWED BY clause to create separate files for each row where a specified column value is in a list

of specified values. Rows with values not listed are stored in a single other file.

You can use the CLUSTERED BY clause to distribute data across a specified number of subfolders (described as buckets )

based on the values of specified columns using a hashing algorithm.

There are a few of ways to execute Hive queries against your HDInsight cluster:

•

Using the Hadoop Command Line

•

Using .NET SDK

•

Using Windows Azure PowerShell

In this chapter, we use Windows Azure PowerShell to create, populate, and query Hive tables. The Hive tables are

based on some demo stock data of different companies as specified here:

•

Apple

•

Facebook

•

Google

•

MSFT

•

IBM

•

Oracle

Let's first load the input files to the WASB that our democluster is using by executing the following PowerShell

script in Listing 8-1. The input files used in this topic are just a subset of the stock market dataset available for free at

www.infochimps.com and is provided separately.

Listing 8-1. Uploading files to WASB

$subscriptionName = "<YourSubscriptionname>"

$storageAccountName = "democluster"

$containerName = "democlustercontainer"

#This path may vary depending on where you place the source .csv files.

$fileName ="D:\HDIDemoLab\TableFacebook.csv"

$blobName = "Tablefacebook.csv"

# Get the storage account key

Select-AzureSubscription $subscriptionName

$storageaccountkey = get-azurestoragekey $storageAccountName | %{$_.Primary}

# Create the storage context object

$destContext = New-AzureStorageContext -StorageAccountName

$storageAccountName -StorageAccountKey $storageaccountkey

# Copy the file from local workstation to the Blob container

Set-AzureStorageBlobContent -File $fileName -Container $containerName

-Blob $blobName -context $destContext

■ repeat these steps with other .csv files in the folder by changing the $filename variable and $blobname

variables and rerun Set-AzureStorageBlobContent .

Note

Pro Microsoft HDInsight: Hadoop on Windows

Search WWH ::

Custom Search

Home