Database Reference
In-Depth Information
My Approach
My approach in this topic is to build the various tools into one large system. Stage by stage, and starting with the
Hadoop Distributed File System (HDFS), which is the big data file system, I do the following:
Introduce the tool
Show how to obtain the installation package
Explain how to install it, with examples
Employ examples to show how it can be used
Given that I have a lot of tools and functions to introduce, I take only a brief look at each one. Instead, I show
you how each of these tools can be used as individual parts of a big data system. It is hoped that you will be able to
investigate them further in your own time.
The Hadoop platform tool set is installed on CentOS Linux 6.2. I use Linux because it is free to download and
has a small footprint on my servers. I use Centos rather than another free version of Linux because some of the
Hadoop tools have been released for CentOS only. For instance, at the time of writing this, Ambari is not available
for Ubuntu Linux.
Throughout the topic, you will learn how you can build a big data system using low-cost, commodity hardware.
I relate the use of these big data tools to various IT roles and follow a step-by-step approach to show how they
are feasible for most IT professionals. Along the way, I point out some solutions to common problems you might
encounter, as well as describe the benefits you can achieve with Hadoop tools. I use small volumes of data to
demonstrate the systems, tools, and ideas; however, the tools scale to very large volumes of data.
Some level of knowledge of Linux, and to a certain extent Java, is assumed. Don't be put off by this; instead, think
of it as an opportunity to learn a new area if you aren't familiar with the subject.
Overview of the Big Data System
While many organizations may not yet have the volumes of data that could be defined as big data, all need to consider
their systems as a whole.A large organization might have a single big data repository. In any event, it is useful to
investigate these technologies as preparation for meeting future needs.
Big Data Flow and Storage
Many of the principles governing business intelligence and data warehousing scale to big data proportions. For
instance, Figure 1-2 depicts a data warehouse system in general terms.
 
Search WWH ::




Custom Search