Database Reference
In-Depth Information
5
Distributed Data
Processing with
Cascalog
In this chapter, we will cover the following recipes:
F Initializing Cascalog and Hadoop for distributed processing
F Querying data with Cascalog
F Distributing data with Apache HDFS
F Parsing CSV iles with Cascalog
F Executing complex queries with Cascalog
F Aggregating data with Cascalog
F Deining new Cascalog operators
F Composing Cascalog queries
F Transforming data with Cascalog
Introduction
Over the course of the last few chapters, we've been progressively moving outward. We started
with the assumption that everything will run on one processor, probably in a single thread.
Then we looked at how to structure our program without this assumption, performing different
tasks on many threads. We then tried to speed up processing by getting multiple threads and
cores working on the same task. Now we've pulled back about as far as we can, and we're
going to take a look at how to break up work in order to execute it on multiple computers. For
large amounts of data, this can be especially useful.
Search WWH ::




Custom Search