Database Reference
In-Depth Information
Cascading in Practice
Now that we know what Cascading is and have a good idea of how it works, what does an
application written in Cascading look like? See
Example 24-1
.
Example 24-1. Word count and sort
Scheme sourceScheme
=
new
TextLine
(
new
Fields
(
"line"
));
Tap source
=
new
Hfs
(
sourceScheme
,
inputPath
);
Scheme sinkScheme
=
new
TextLine
();
Tap sink
=
new
Hfs
(
sinkScheme
,
outputPath
,
SinkMode
.
REPLACE
);
Pipe assembly
=
new
Pipe
(
"wordcount"
);
String regexString
=
"(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)"
;
Function regex
=
new
RegexGenerator
(
new
Fields
(
"word"
),
regexString
);
assembly
=
new
Each
(
assembly
,
new
Fields
(
"line"
),
regex
);
assembly
=
new
GroupBy
(
assembly
,
new
Fields
(
"word"
));
Aggregator count
=
new
Count
(
new
Fields
(
"count"
));
assembly
=
new
Every
(
assembly
,
count
);
assembly
=
new
GroupBy
(
assembly
,
new
Fields
(
"count"
),
new
Fields
(
"word"
));
FlowConnector flowConnector
=
new
FlowConnector
();
Flow flow
=
flowConnector
.
connect
(
"word-count"
,
source
,
sink
,
assembly
);
flow
.
complete
();
We create a new
Scheme
that reads simple text files and emits a new
Tuple
for each
line in a field named “line,” as declared by the
Fields
instance.