Database Reference
In-Depth Information
Processing is orchestrated with Oozie. Every time new data arrives, a new dataset is cre-
ated with a unique identifier in a well-defined location in HDFS. Oozie coordinators
watch that location and simply launch Crunch jobs to create downstream datasets, which
may subsequently be picked up by other coordinators. At the time of this writing, datasets
and updates are identified by UUIDs to keep them unique. However, we are in the process
of placing new data in timestamp-based partitions in order to better work with Oozie's
nominal time model.
Search WWH ::




Custom Search