Composable Data at Cerner - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

new MapFn < PersonRecord , Pair < String , String >>() {

@Override

public Pair < String , String > map ( PersonRecord input ) {

return Pair . of ( input . getSourceId (), input . getPersonId ());

}

}, pairs ( strings (), strings ()));

Joining the two PTable objects will return a PTable<Pair<String, String>,

Pair<EMPIRecord, PersonRecord>> . In this situation, the keys are no longer

useful, so we change the table to be keyed by the EMPI identifier:

PTable < String , PersonRecord > personRecordKeyedByEMPI =

keyedPersonRecords

. join ( keyedEmpiRecords )

. values ()

. by ( new MapFn < Pair < PersonRecord , EMPIRecord >>() {

@Override

public String map ( Pair < PersonRecord , EMPIRecord > input ) {

return input . second (). getEmpiId ();

}

}, strings ()));

The final step is to group the table by its key to ensure all of the data is aggregated togeth-

er for processing as a complete collection:

PGroupedTable < String , PersonRecord > groupedPersonRecords =

personRecordKeyedByEMPI . groupByKey ();

The PGroupedTable would contain data like that in Table 22-4 .

This logic to unify data sources is the first step of a larger execution flow. Other Crunch

functions downstream build on these steps to meet many client needs. In a common use

case, a number of problems are solved by loading the contents of the unified PersonRe-

cord s into a rules-based processing model to emit new clinical knowledge. For instance,

we may run rules over those records to determine if a diabetic is receiving recommended

care, and to indicate areas that can be improved. Similar rule sets exist for a variety of

needs, ranging from general wellness to managing complicated conditions. The logic can

be complicated and with a lot of variance between use cases, but it is all hosted in func-

tions composed in a Crunch pipeline.

Search WWH ::

Custom Search

Home