Database Reference
In-Depth Information
hdInsight currently supports hive, pig, oozie, sqoop, and hCatalog out of the box. the plan is to also ship
hBase and Flume in future versions. the beauty of hdInsight (or any other distribution) is that it is implemented on top
of the hadoop core. so you can install and configure any of these supporting projects on the default install. there is also
every possibility that hdInsight will support more of these projects going forward, depending on user demand.
Note
Microsoft HDInsight: Hadoop on Windows
HDInsight is Microsoft's implementation of a Big Data solution with Apache Hadoop at its core. HDInsight is 100
percent compatible with Apache Hadoop and is built on open source components in conjunction with Hortonworks,
a company focused toward getting Hadoop adopted on the Windows platform. Basically, Microsoft has taken the open
source Hadoop project, added the functionalities needed to make it compatible with Windows (because Hadoop
is based on Linux), and submitted the project back to the community. All of the components are retested in typical
scenarios to ensure that they work together correctly and that there are no versioning or compatibility issues.
I'm a great fan of such integration because I can see the boost it might provide to the industry, and I was excited
with the news that the open source community has included Windows-compatible Hadoop in their main project
trunk. Developments in HDInsight are regularly fed back to the community through Hortonworks so that they can
maintain compatibility and contribute to the fantastic open source effort.
Microsoft's Hadoop-based distribution brings the robustness, manageability, and simplicity of Windows to the
Hadoop environment. The focus is on hardening security through integration with Active Directory, thus making it
enterprise ready, simplifying manageability through integration with System Center 2012, and dramatically reducing
the time required to set up and deploy via simplified packaging and configuration.
These improvements will enable IT to apply consistent security policies across Hadoop clusters and manage them
from a single pane of glass on System Center 2012. Further, Microsoft SQL Server and its powerful BI suite can be leveraged
to apply analytics and generate interactive business intelligence reports, all under the same roof. For the Hadoop-based
service on Windows Azure, Microsoft has further lowered the barrier to deployment by enabling the seamless setup and
configuration of Hadoop clusters through an easy-to-use, web-based portal and offering Infrastructure as a Service (IaaS).
Microsoft is currently the only company offering scalable Big Data solutions in the cloud and for on-premises use. These
solutions are all built on a common Microsoft Data Platform with familiar and powerful BI tools.
HDInsight is available in two flavors that will be covered in subsequent chapters of this topic:
Windows Azure HDInsight Service: This is a service available to Windows Azure subscribers
that uses Windows Azure clusters and integrates with Windows Azure storage. An Open
Database Connectivity (ODBC) driver is available to connect the output from HDInsight
queries to data analysis tools.
Windows Azure HDInsight Emulator: This is a single-node, single-box product that you
can install on Windows Server 2012, or in your Hyper-V virtual machines. The purpose of
the emulator is to provide a development environment for use in testing and evaluating your
solution before deploying it to the cloud. You save money by not paying for Azure hosting until
after your solution is developed and tested and ready to run. The emulator is available for free
and will continue to be a single-node offering.
While keeping all these details about Big Data and Hadoop in mind, it would be incorrect to think that HDInsight
is a stand-alone solution or a separate solution of its own. HDInsight is, in fact, a component of the Microsoft Data
Platform and part of the company's overall data acquisition, management, and visualization strategy.
Figure 1-4 shows the bigger picture, with applications, services, tools, and frameworks that work together and
allow you to capture data, store it, and visualize the information it contains. Figure 1-4 also shows where HDInsight
fits into the Microsoft Data Platform.
 
 
Search WWH ::




Custom Search