Databases Reference
In-Depth Information
12/12/25 09:58:16 INFO flow.Flow: [ Tutorial1 ] starting jobs : 1
12/12/25 09:58:16 INFO flow.Flow: [ Tutorial1 ] allocating threads: 1
12/12/25 09:58:16 INFO flow.FlowStep: [ Tutorial1 ] starting step: local
Then to confirm the results after the Scalding code has run:
$ cat tutorial/data/output1.txt
Hello world
Goodbye world
If your results look similar, you should be good to go.
Otherwise, if you have any troubles, contact the cascading-user email forum or tweet
to @Scalding on Twitter. Very helpful developers are available to assist.
Example 3 in Scalding: Word Count with Customized
Operations
First, let's try a simple app in Scalding. Starting from the “Impatient” source code di‐
rectory that you cloned in Git, connect into the part8 subdirectory. Then we'll write a
Word Count app in Scalding that includes a token scrub operation, similar to “Example
3: Customized Operations” on page 17 :
import com.twitter.scalding._
class Example3 ( args : Args ) extends Job ( args ) {
Tsv ( args ( "doc" ), ( 'doc_id , 'text ), skipHeader = true )
. read
. flatMap ( 'text -> 'token ) { text : String => text . split ( "[ \\[\\]\\(\\),.]" ) }
. mapTo ( 'token -> 'token ) { token : String => scrub ( token ) }
. filter ( 'token ) { token : String => token . length > 0 }
. groupBy ( 'token ) { _ . size ( 'count ) }
. write ( Tsv ( args ( "wc" ), writeHeader = true ))
def scrub ( token : String ) : String = {
token
. trim
. toLowerCase
}
override def config ( implicit mode : Mode ) : Map [ AnyRef , AnyRef ] = {
// resolves "ClassNotFoundException cascading.*" exception on a cluster
super . config ( mode ) ++ Map ( "cascading.app.appjar.class" -> classOf [ Example3 ])
}
}
Let's compare this code for Word Count with the conceptual flow diagram for “Example
3: Customized Operations” , which is shown in Figure 4-1 . The lines of Scalding source
code have an almost 1:1 correspondence with the elements in this flow diagram. In other
Search WWH ::




Custom Search