Database Reference
In-Depth Information
public class CascadingCopyPipe {
public static void main(String[] args) {
String input = args[0];
String output = args[1];
Properties properties = new Properties();
AppProps.setApplicationJarClass(properties, CascadingCopyPipe.class);
HadoopFlowConnector flowConnector =
new HadoopFlowConnector(properties);
Tap inTap = new Hfs(new TextDelimited(true, ","), input);
Tap outTap = new Hfs(new TextDelimited(true, ","), output);
Pipe copyPipe = new Pipe("Copy Pipeline");
FlowDef flowDef = FlowDef.flowDef()
.addSource(copyPipe, inTap)
.addTailSink(copyPipe, outTap);
flowConnector.connect(flowDef).complete();
}
}
Creating a Cascade: A Simple JOIN Example
In order to build more workflows, we need to be able to connect various pipes of data
together. Cascading's abstraction model provides several types of pipe assemblies that can
be used to filter, group, and join data streams.
Just as with water pipes, it is possible for Cascading to split data streams into two
separate pipes, or merge various pipes of data into one. In order to connect two data
streams into one, we can chain pipes together using a Cascading CoGroup process.
Using a CoGroup, we can merge two streams using either a common key or a set
of common tuple values. This operation should be familiar to users of SQL, as it is
very similar to a simple JOIN . Cascading CoGroup f lows require that the values being
grouped together have the same type. Listing 9.5 presents a CoGroup example.
Listing 9.5 Using CoGroup() to join two streams on a value
import java.util.Properties;
import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.pipe.CoGroup;
import cascading.pipe.Pipe;
import cascading.pipe.joiner.InnerJoin;
 
 
Search WWH ::




Custom Search