Database Reference
In-Depth Information
new MapFn < PersonRecord , Pair < String , String >>() {
@Override
public Pair < String , String > map ( PersonRecord input ) {
return Pair . of ( input . getSourceId (), input . getPersonId ());
}
}, pairs ( strings (), strings ()));
Joining the two PTable objects will return a PTable<Pair<String, String>,
Pair<EMPIRecord, PersonRecord>> . In this situation, the keys are no longer
useful, so we change the table to be keyed by the EMPI identifier:
PTable < String , PersonRecord > personRecordKeyedByEMPI =
keyedPersonRecords
. join ( keyedEmpiRecords )
. values ()
. by ( new MapFn < Pair < PersonRecord , EMPIRecord >>() {
@Override
public String map ( Pair < PersonRecord , EMPIRecord > input ) {
return input . second (). getEmpiId ();
}
}, strings ()));
The final step is to group the table by its key to ensure all of the data is aggregated togeth-
er for processing as a complete collection:
PGroupedTable < String , PersonRecord > groupedPersonRecords =
personRecordKeyedByEMPI . groupByKey ();
The PGroupedTable would contain data like that in Table 22-4 .
This logic to unify data sources is the first step of a larger execution flow. Other Crunch
functions downstream build on these steps to meet many client needs. In a common use
case, a number of problems are solved by loading the contents of the unified PersonRe-
cord s into a rules-based processing model to emit new clinical knowledge. For instance,
we may run rules over those records to determine if a diabetic is receiving recommended
care, and to indicate areas that can be improved. Similar rule sets exist for a variety of
needs, ranging from general wellness to managing complicated conditions. The logic can
be complicated and with a lot of variance between use cases, but it is all hosted in func-
tions composed in a Crunch pipeline.
Search WWH ::




Custom Search