Database Reference
In-Depth Information
Figure 5-4. Hadoop V2 Fair scheduler
As shown in Figure 5-4 , the V2 fair scheduler is configured and working correctly.
Using Oozie for Workflow
Capacity, Fair, and similar plug-in schedulers deal with resources allocated to individual jobs over a period of time.
However, what about the relationships between jobs and the dependencies between them? That's where workflow
managers, like Apache's Oozie, come in. This section examines Hadoop job-based workflows and scheduling, and
demonstrates how tools like Oozie enable you to manage related jobs as workflows.
A workflow scheduler for Hadoop, Oozie is integrated into many of the Hadoop tools, such as Pig, Hive, Map
Reduce, and Streaming. Oozie workflows are defined as directed acyclical graphs (DAGs) and are stored as XML.
In this section, I will demonstrate how to install the Oozie component that is part of the Cloudera installation
that was used in Chapter 2. I show how to check that it is working, and how to create an example workflow. For further
details, check the Apache Software Foundation website at oozie.apache.org .
Installing Oozie
You start by installing the Oozie client and server as root, using Yum, the Linux-based package manager.
You install Oozie on the single CentOS 6 Linux server hc1nn, as follows:
[root@hc1nn conf]# yum install oozie
[root@hc1nn conf]# yum install oozie-client
 
Search WWH ::




Custom Search