The Problem with Data - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

Summary

While introducing the challenges and benefits of big data, this chapter also presents a set of requirements for big data

systems and explains how they can be met by utilizing the tools discussed in the remaining chapters of this topic.

The aim of this topic has been to explain the building of a big data processing system by using the Hadoop tool

set. Examples are used to explain the functionality provided by each Hadoop tool. Starting with HDFS for storage,

followed by Nutch and Solr for data capture, each chapter covers a new area of functionality, providing a simple

overview of storage, processing, and scheduling. With these examples and the step-by-step approach, you can build

your knowledge of big data possibilities and grow your familiarity with these tools. By the end of Chapter 11, you will

have learned about most of the major functional areas of a big data system.

As you read through this topic, you should consider how to use the individual Hadoop components in your own

systems. You will also notice a trend toward easier methods of system management and development. For instance,

Chapter 2 starts with a manual installation of Hadoop, while Chapter 8 uses cluster managers. Chapter 4 shows

handcrafted code for Map Reduce programming, but Chapter 10 introduces visual object based Map Reduce task

development using Talend and Pentaho.

Now it's time to start, and we begin by looking at Hadoop itself. The next chapter introduces the Hadoop

application and its uses, and shows how to configure and use it.

Search WWH ::

Custom Search

Home