Composable Data at Cerner - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Processing is orchestrated with Oozie. Every time new data arrives, a new dataset is cre-

ated with a unique identifier in a well-defined location in HDFS. Oozie coordinators

watch that location and simply launch Crunch jobs to create downstream datasets, which

may subsequently be picked up by other coordinators. At the time of this writing, datasets

and updates are identified by UUIDs to keep them unique. However, we are in the process

of placing new data in timestamp-based partitions in order to better work with Oozie's

nominal time model.

Search WWH ::

Custom Search

Home