Database Reference
In-Depth Information
Both versions of the cluster ship with stable components of hdp and the underlying hadoop eco-system.
however, i recommend the latest version, which is 2.1 as of this writing. the latest version will have the latest
enhancements and updates from the open source community. it will also have fixes to bugs that were reported
against previous versions. For those reasons, my preference is to run on the latest available version unless there is
some specific reason to do otherwise by running some older version.
Note
the component versions associated with hdinsight cluster versions may change in future updates to hdinsight. one
way to determine the available components and their versions is to login to a cluster using remote desktop, go
directly to the cluster's name node, and then examine the contents of the C:\apps\dist\ directory.
Storage Location Options
When you create a Hadoop cluster on Azure, you should understand the different storage mechanisms. Windows
Azure has three types of storage available: blob, table, and queue:
Blob storage: Binary Large Objects (blob) should be familiar to most developers. Blob storage
is used to store things like images, documents, or videos—something larger than a first name
or address. Blob storage is organized by containers that can have two types of blobs: Block and
Page. The type of blob needed depends on its usage and size. Block blobs are limited to 200
GBs, while Page blobs can go up to 1 TB. Blob storage can be accessed via REST APIs with a
URL such as http://debarchans.blob.core.windows.net/MyBLOBStore .
Table storage: Azure tables should not be confused with tables from an RDBMS like SQL
Server. They are composed of a collection of entities and properties, with properties further
containing collections of name, type, and value. One thing I particularly don't like as a
developer is that Azure tables can't be accessed using ADO.NET methods. As with all other
Azure storage methods, access is provided through REST APIs, which you can access at the
following site: http://debarchans.table.core.winodws.net/MyTableStore .
Queue storage: Queues are used to transport messages between applications. Azure queues
are conceptually the same as Microsoft Messaging Queue (MSMQ), except that they are
for the cloud. Again, REST API access is available. For example, this could be an URL like:
http://debarchans.queue.core.windows.net/MyQueueStore .
Note
hdinsight supports only azure blob storage.
Azure storage accounts
The HDInsight provision process requires a Windows Azure Storage account to be used as the default file system. The
storage locations are referred to as Windows Azure Storage Blob (WASB), and the acronym WASB: is used to access
them. WASB is actually a thin wrapper on the underlying Windows Azure Blob Storage (WABS) infrastructure, which
exposes blob storage as HDFS in HDInsight and is a notable change in Microsoft's implementation of Hadoop on
Windows Azure. (Learn more about WASB in the upcoming section Understanding the Windows Azure Storage Blob ).
For instructions on creating a storage account, see the following URL:
http://www.windowsazure.com/en-us/manage/services/storage/how-to-create-a-storage-account/
 
 
Search WWH ::




Custom Search