Databases Reference
In-Depth Information
CUST_ID
=
100
;
PROD_ID
=
10
;
EMPID
=
100
;
NAME
=
Bill
CUST_ID
=
150
;
PROD_ID
=
20
;
EMPID
=
150
;
NAME
=
Sebastian
It's interesting to consider how the code would look in an equivalent Cascading app:
Tap
empTap
=
new
FileTap
(
new
TextDelimited
(
true
,
","
,
"\""
),
"src/test/data/employee.txt"
);
Tap
salesTap
=
new
FileTap
(
new
TextDelimited
(
true
,
","
,
"\""
),
"src/test/data/salesfact.txt"
);
Tap
resultsTap
=
new
FileTap
(
new
TextDelimited
(
true
,
","
,
"\""
),
"build/test/output/results.txt"
,
SinkMode
.
REPLACE
);
Pipe
empPipe
=
new
Pipe
(
"emp"
);
Pipe
salesPipe
=
new
Pipe
(
"sales"
);
Pipe
join
=
new
CoGroup
(
empPipe
,
new
Fields
(
"empid"
),
salesPipe
,
new
Fields
(
"cust_id"
));
FlowDef
flowDef
=
flowDef
()
.
setName
(
"flow"
)
.
addSource
(
empPipe
,
empTap
)
.
addSource
(
salesPipe
,
salesTap
)
.
addTailSink
(
join
,
resultsTap
);
Flow
flow
=
new
LocalFlowConnector
().
connect
(
flowDef
);
flow
.
start
();
TupleEntryIterator
iterator
=
resultTap
.
openForRead
();
Arguably, that code is more compact than the JDBC use case. Even so, Lingual allows
for Cascading apps that read SQL queries as flat files, as command-line options—which
can leverage a great number of existing ANSI SQL queries.
Integrating with Desktop Tools
By virtue of having a JDBC connector into Cascading workflows on Apache Hadoop
clusters, we can leverage many existing SQL tools. For example,
Toad
is a popular tool
for interacting with SQL frameworks.
RStudio
(shown in
Figure 6-4
) is a popular IDE
for statistical computing in R, which can import data through JDBC.