Scalding—A Scala DSL for Cascading - Enterprise Data Workflows with Cascading

Databases Reference

In-Depth Information

12/12/25 09:58:16 INFO flow.Flow: [ Tutorial1 ] starting jobs : 1

12/12/25 09:58:16 INFO flow.Flow: [ Tutorial1 ] allocating threads: 1

12/12/25 09:58:16 INFO flow.FlowStep: [ Tutorial1 ] starting step: local

Then to confirm the results after the Scalding code has run:

$ cat tutorial/data/output1.txt

Hello world

Goodbye world

If your results look similar, you should be good to go.

Otherwise, if you have any troubles, contact the cascading-user email forum or tweet

to @Scalding on Twitter. Very helpful developers are available to assist.

Example 3 in Scalding: Word Count with Customized

Operations

First, let's try a simple app in Scalding. Starting from the “Impatient” source code di‐

rectory that you cloned in Git, connect into the part8 subdirectory. Then we'll write a

Word Count app in Scalding that includes a token scrub operation, similar to “Example

3: Customized Operations” on page 17 :

import com.twitter.scalding._

class Example3 ( args : Args ) extends Job ( args ) {

Tsv ( args ( "doc" ), ( 'doc_id , 'text ), skipHeader = true )

. read

. flatMap ( 'text -> 'token ) { text : String => text . split ( "[ \\[\\]\$\$,.]" ) }

. mapTo ( 'token -> 'token ) { token : String => scrub ( token ) }

. filter ( 'token ) { token : String => token . length > 0 }

. groupBy ( 'token ) { _ . size ( 'count ) }

. write ( Tsv ( args ( "wc" ), writeHeader = true ))

def scrub ( token : String ) : String = {

token

. trim

. toLowerCase

}

override def config ( implicit mode : Mode ) : Map [ AnyRef , AnyRef ] = {

// resolves "ClassNotFoundException cascading.*" exception on a cluster

super . config ( mode ) ++ Map ( "cascading.app.appjar.class" -> classOf [ Example3 ])

}

Let's compare this code for Word Count with the conceptual flow diagram for “Example

3: Customized Operations” , which is shown in Figure 4-1 . The lines of Scalding source

code have an almost 1:1 correspondence with the elements in this flow diagram. In other

Search WWH ::

Custom Search

Home