Database Reference
In-Depth Information
wikipedia.#it.wikipedia,
wikipedia.#nl.wikipedia,
wikipedia.#pt.wikipedia,
wikipedia.#es.wikipedia,
wikipedia.#ru.wikipedia,
wikipedia.#sv.wikipedia,
wikipedia.#zh.wikipedia,
wikipedia.#fi.wikipedia
This results in much more interesting traffic to analyze later.
The Hello Samza application also includes a parser job that converts
the raw JSON into a more useful form. Because Samza uses Kafka for
its communication, this is available as another Kafka topic called
wikipedia-edits . It is started the same way as the first Samza job,
just with a different properties file:
$ ./bin/run-job.sh \
> --config-factory=\
>
org.apache.samza.config.factories.PropertiesConfigFactory
\
> --config-path=file://`pwd`/config/
wikipedia-parser.properties
The output from this stream should look something like this:
{"summary":"/* Episodes */ fix",
"time":1382240999886,
"title":"Cheers (season 5)",
"flags":{
"is-bot-edit":false,
"is-talk":false,
"is-unpatrolled":false,
"is-new":false,
"is-special":false,
"is-minor":false
Search WWH ::




Custom Search