Database Reference
In-Depth Information
Chapter 3
Configuring Your First Big Data Environment
What You Will Learn in This Chapter
• Getting Started
• Finding the Installation Tools
• Running the Installation
• Validating Your New Cluster
• Learning Post Install Tasks
In this chapter, you learn the steps necessary to get Hortonworks Data
Platform (HDP) and HDInsight Service installed and configured for your
use. You'll first walk through the install of HDP on a local Windows Server.
Next you'll walk through installing HDInsight on Windows Azure. You'll then
follow up on some basic steps on verifying your installs by analyzing log
files. Finally, you'll load some data into the Hadoop Distributed File System
(HDFS) and run some queries against it using Hive and Pig. This chapter will
introduce you to and prepare you for many of the big data features you will
be using throughout the rest of the topic.
Getting Started
This chapter covers two common scenarios: a single-node Hadoop cluster
for simple testing and kicking-the-tires Hadoop; and then we configure an
HDInsight cluster in Windows Azure with four nodes to understand the
vagaries of the cloud environment. This chapter assumes that the on-premise
clusterisbeing builtinaHyper-V environment orothersimilar virtualization
technology for your initial development environment. (Later in this topic,
Chapter 16, “Operational Big Data Management,” shows what an enterprise
class cluster may look like when built so that you have a guideline for a
production-class Hadoop cluster.)
Hadoop is an enterprise solution that requires server-class software to run.
Therefore, the first thing you need for the installation of the HDP is a copy of
one of the following:
Search WWH ::




Custom Search