Database Reference
In-Depth Information
When prompted for credentials, provide hadoop as the user name and type in any text as the password. this is
essentially a dummy credential prompt, which is needed to maintain compatibility with the azure service from
powershell scripts.
Note
Future Directions
With hardware cost decreasing considerably over the years, organizations are leaning toward appliance-based,
data-processing engines. An appliance is a combination of hardware units and built-in software programs suitable
for a specific kind of workload. Though Microsoft has no plans to offer a multinode HDInsight solution for use on
premises, it does offer an appliance-based multiunit, and massively parallel processing (MPP) device, called the
Parallel Data Warehouse (PDW). Microsoft PDW gives you performance and scalability for data warehousing with the
plug and play simplicity of an appliance. Some nodes in the appliance can run SQL PDW, and some nodes can run
Hadoop (called a Hadoop Region ). A new data-processing technology called Polybase has been introduced, which is
designed to be the simplest way to combine nonrelational data and traditional relational data for your analysis. It acts
as a bridge to allow SQL PDW to send queries to Hadoop and fetch data results. The nice thing is that users can send
regular SQL queries to PDW, and Hadoop can run them and fetch data from unstructured files. To learn more about
PDW and Polybase, see the following MSDN page:
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx
The Open Source Apache Hadoop project is going through a lot of changes as well. In the near future, Hadoop
version 2.0 will be widely available. Hadoop 2.0 introduces a new concept called Yet Another Resource Negotiator
(YARN) on top of traditional MapReduce. This is also known as MapReduce 2.0 or MRv2. With HDInsight internally
using Hadoop, it is highly likely that the Azure Service and the Emulator will be upgraded to Hadoop 2.0 as well in due
course. The underlying architecture, however, will be the same in terms job submissions and end-user interactions;
hence, the impact of this change to the readers and users will be minimal.
Summary
The HDInsight offering is essentially a cloud service from Microsoft. Since even evaluating the Windows Azure
HDInsight Service involves some cost, an emulator is available as a single-node box product on your Windows
Server systems, which you can use as your playground to test and evaluate the technology. The Windows Azure
HDInsight Emulator uses the same software bits as the Azure Service and supports the exact same set of functionality.
It is designed to be scalable and perform massive parallel processing, so you can test your Big Data solution on the
emulator. Once you are satisfied, you can deploy your actual solution to production in Azure and take advantage of
multinode Hadoop clusters on Windows Azure. For on-premises use, Microsoft is offering its Parallel Data Warehouse
(PDW) technology, which is an appliance-based multinode HDInsight cluster, while the emulator will continue to be
single node and serve as a test bed.
 
 
Search WWH ::




Custom Search