Database Reference
In-Depth Information
Chapter 1
Introducing HDInsight
HDInsight is Microsoft's distribution of “Hadoop on Windows.” Microsoft has embraced Apache Hadoop to provide
business insight to all users interested in tuning raw data into meaning by analyzing all types of data, structured or
unstructured, of any size. The new Hadoop-based distribution for Windows offers IT professionals ease of use by
simplifying the acquisition, installation and configuration experience of Hadoop and its ecosystem of supporting
projects in Windows environment. Thanks to smart packaging of Hadoop and its toolset, customers can install and
deploy Hadoop in hours instead of days using the user-friendly and flexible cluster deployment wizards.
This new Hadoop-based distribution from Microsoft enables customers to derive business insights on structured
and unstructured data of any size and activate new types of data. Rich insights derived by analyzing Hadoop data can
be combined seamlessly with the powerful Microsoft Business Intelligence Platform. The rest of this chapter will focus
on the current data-mining trends in the industry, the limitations of modern-day data-processing technologies, and
the evolution of HDInsight as a product.
What Is Big Data, and Why Now?
All of a sudden, everyone has money for Big Data. From small start-ups to mid-sized companies and large enterprises,
businesses are now keen to invest in and build Big Data solutions to generate more intelligent data. So what is Big
Data all about?
In my opinion, Big Data is the new buzzword for a data mining technology that has been around for quite some
time. Data analysts and business managers are fast adopting techniques like predictive analysis, recommendation
service, clickstream analysis etc. that were commonly at the core of data processing in the past, but which have
been ignored or lost in the rush to implement modern relational database systems and structured data storage. Big
Data encompasses a range of technologies and techniques that allow you to extract useful and previously hidden
information from large quantities of data that previously might have been left dormant and, ultimately, thrown away
because storage for it was too costly.
Big Data solutions aim to provide data storage and querying functionality for situations that are, for various reasons,
beyond the capabilities of traditional database systems. For example, analyzing social media sentiments for a brand
has become a key parameter for judging a brand's success. Big Data solutions provide a mechanism for organizations to
extract meaningful, useful, and often vital information from the vast stores of data that they are collecting.
Big Data is often described as a solution to the “three V's problem”:
Variety: It's common for 85 percent of your new data to not match any existing data
schema. Not only that, it might very well also be semi-structured or even unstructured
data. This means that applying schemas to the data before or during storage is no longer a
practical option.
Volume: Big Data solutions typically store and query thousands of terabytes of data, and
the total volume of data is probably growing by ten times every five years. Storage solutions
must be able to manage this volume, be easily expandable, and work efficiently across
distributed systems.
 
Search WWH ::




Custom Search