Databases Reference
In-Depth Information
To build:
$ lein clean
$ lein uberjar
Created /Users/ceteri/opt/Impatient/part6/target/impatient.jar
To run:
$ rm -rf output
$ hadoop jar target/impatient.jar data/rain.txt output/wc \
data/en.stop output/tfidf
To verify the output:
$ cat output/trap/part-m-00001-00001
zoink
$ head output/tfidf/part-00000
doc02 0.22314355131420976 area
doc01 0.44628710262841953 area
doc03 0.22314355131420976 area
doc05 0.9162907318741551 australia
doc05 0.9162907318741551 broken
doc04 0.9162907318741551 california ' s
doc04 0.9162907318741551 cause
doc02 0.9162907318741551 cloudcover
doc04 0.9162907318741551 death
doc04 0.9162907318741551 deserts
“Example 6 in Cascalog: TF-IDF with Testing” also includes unit tests, with source code
in the test/impatient/core_test.clj file:
( ns impatient.core-test
( :use impatient.core
clojure.test
cascalog.api
[ midje sweet cascalog ]))
( deftest scrub-text-test
( fact
( scrub-text "FoO BAR " ) => "foo bar" ))
( deftest etl-docs-gen-test
( let [ rain [[ "doc1" "a b c" ]]
stop [[ "b" ]]]
( fact
( etl-docs-gen rain stop ) => ( produces [[ "doc1" "a" ]
[ "doc1" "c" ]]))))
Note the reference to midje in the namespace. These tests are based on a test framework
called Midje-Cascalog, described by Ritchie on his GitHub project and in substantially
more detail in his article about best practices for Cascalog testing .
Midje enables you to test Cascalog queries as functions, whether they are isolated or
within part of a workflow. Each test definition shown in the preceding code uses fact
Search WWH ::




Custom Search