Databases Reference
In-Depth Information
CUST_ID = 100 ; PROD_ID = 10 ; EMPID = 100 ; NAME = Bill
CUST_ID = 150 ; PROD_ID = 20 ; EMPID = 150 ; NAME = Sebastian
It's interesting to consider how the code would look in an equivalent Cascading app:
Tap empTap =
new FileTap ( new TextDelimited ( true , "," , "\"" ), "src/test/data/employee.txt" );
Tap salesTap =
new FileTap ( new TextDelimited ( true , "," , "\"" ), "src/test/data/salesfact.txt" );
Tap resultsTap =
new FileTap ( new TextDelimited ( true , "," , "\"" ), "build/test/output/results.txt" ,
SinkMode . REPLACE );
Pipe empPipe = new Pipe ( "emp" );
Pipe salesPipe = new Pipe ( "sales" );
Pipe join =
new CoGroup ( empPipe , new Fields ( "empid" ), salesPipe , new Fields ( "cust_id" ));
FlowDef flowDef = flowDef ()
. setName ( "flow" )
. addSource ( empPipe , empTap )
. addSource ( salesPipe , salesTap )
. addTailSink ( join , resultsTap );
Flow flow = new LocalFlowConnector (). connect ( flowDef );
flow . start ();
TupleEntryIterator iterator = resultTap . openForRead ();
Arguably, that code is more compact than the JDBC use case. Even so, Lingual allows
for Cascading apps that read SQL queries as flat files, as command-line options—which
can leverage a great number of existing ANSI SQL queries.
Integrating with Desktop Tools
By virtue of having a JDBC connector into Cascading workflows on Apache Hadoop
clusters, we can leverage many existing SQL tools. For example, Toad is a popular tool
for interacting with SQL frameworks. RStudio (shown in Figure 6-4 ) is a popular IDE
for statistical computing in R, which can import data through JDBC.
Search WWH ::




Custom Search