Databases Reference
In-Depth Information
12/12/25 09:58:16 INFO flow.Flow:
[
Tutorial1
]
starting
jobs
: 1
12/12/25 09:58:16 INFO flow.Flow:
[
Tutorial1
]
allocating threads: 1
12/12/25 09:58:16 INFO flow.FlowStep:
[
Tutorial1
]
starting step:
local
Then to confirm the results after the Scalding code has run:
$
cat tutorial/data/output1.txt
Hello world
Goodbye world
If your results look similar, you should be good to go.
to
@Scalding
on Twitter. Very helpful developers are available to assist.
Example 3 in Scalding: Word Count with Customized
Operations
First, let's try a simple app in Scalding. Starting from the “Impatient” source code di‐
rectory that you cloned in Git, connect into the
part8
subdirectory. Then we'll write a
Word Count
app in Scalding that includes a token scrub operation, similar to
“Example
3: Customized Operations” on page 17
:
import
com.twitter.scalding._
class
Example3
(
args
:
Args
)
extends
Job
(
args
)
{
Tsv
(
args
(
"doc"
),
(
'doc_id
,
'text
),
skipHeader
=
true
)
.
read
.
flatMap
(
'text
->
'token
)
{
text
:
String
=>
text
.
split
(
"[ \\[\\]\\(\\),.]"
)
}
.
mapTo
(
'token
->
'token
)
{
token
:
String
=>
scrub
(
token
)
}
.
filter
(
'token
)
{
token
:
String
=>
token
.
length
>
0
}
.
groupBy
(
'token
)
{
_
.
size
(
'count
)
}
.
write
(
Tsv
(
args
(
"wc"
),
writeHeader
=
true
))
def
scrub
(
token
:
String
)
:
String
=
{
token
.
trim
.
toLowerCase
}
override
def
config
(
implicit
mode
:
Mode
)
:
Map
[
AnyRef
,
AnyRef
]
=
{
// resolves "ClassNotFoundException cascading.*" exception on a cluster
super
.
config
(
mode
)
++
Map
(
"cascading.app.appjar.class"
->
classOf
[
Example3
])
}
}
Let's compare this code for
Word Count
with the conceptual flow diagram for
“Example
3: Customized Operations”
, which is shown in
Figure 4-1
. The lines of Scalding source
code have an almost 1:1 correspondence with the elements in this flow diagram. In other