Database Reference
In-Depth Information
Summary
This chapter presented Hadoop-based schedulers and discussed their use for Hadoop V1 and V2. Remember that
each scheduler type is meant for a different scenario. The Capacity scheduler enables multiple tenants to share a
cluster of resources, while the Fair scheduler enables multiple projects for a single tenant to share a cluster. The
aim overall is to share cluster resources appropriately. Keep checking the Hadoop website ( hadoop.apache.org ) for
version updates applicable to the scheduling function.
While these schedulers allow the sharing of resources, tools like Oozie offer the ability to schedule jobs that are
organized into workflows by time and event. Using an example, this chapter has shown how to create a workflow and
how to schedule it. Additionally, the Oozie console was used to examine the job output and status.
As a final suggestion, you might consider investigating workflow schedulers like Azkaban and Luigi as well
to give you some idea of comparable functionality. Azkaban uses DAGs like Oozie, and it integrates with Hadoop
components like Pig and Hive. Luigi is a simple workflow engine written in Python; at the time of this writing, it
integrates with Hive but not with Pig.
 
Search WWH ::




Custom Search