Database Reference
In-Depth Information
exploration to parallel computing, publication, and education.” 6 In fact, iPython has
seen quite a lot of growth among scientific users, and as a result the project has also
been awarded a grant by the Sloan Foundation to help drive development of more col-
laborative and interactive visualization tools.
iPython adds an important tooling layer to the standard Python shell, including
features such as autocomplete and the ability to access interactive help. It's very easy to
incorporate existing Python scripts into iPython's interactive workf low. iPython also
has an excellent notebook mode that provides iPython's features through an interac-
tive Web application. When starting up iPython with the notebook command, a Web
server will be launched directly on the workstation, and a browser-based interface
becomes available on a local URL. Python commands and output can be run directly
in the browser window, and best of all, these notebooks can be saved, exported, and
shared with others.
Parallelizing iPython Using a Cluster
As we've mentioned before, one of the advantages of distributed-processing frame-
works such as Hadoop is the ability to wrangle multiple machines to help solve large
data problems quickly. For many people, Hadoop is the de facto method of running
such tasks, but it's not always the best fit for the job. Although Hadoop is becoming
more and more automated, there's often quite a lot of administrative overhead when
initializing and running a Hadoop cluster, not to mention a great deal of work in
writing the workf low code (see Chapter 9, “Building Data Transformation Workf lows
with Pig and Cascading,” for more on Hadoop workf low tools). Often, all we want to
do is simply farm a task out to a number of machines or even a set of processors on a
multicore machine with as little effort as possible.
iPython makes it easy to run tasks in parallel, by coordinating running Python
commands across a distributed network of machines (which iPython calls engines ).
iPython takes advantage of the very fast message-passing library called ØMQ (a.k.a.
ZeroMQ) to coordinate messaging between multiple machines. Even if you don't have
a cluster of machines available, you can observe some of the advantages of parallel
computing on a multicore local machine. As with Hadoop, it's possible to test iPython
scripts locally before extending them to run across a cluster of machines. Even better,
iPython enables you to run these scripts interactively.
Let's look at a simple example meant to tax the CPU a bit. In Listing 12.7, we use
the NumPy random package to generate a list of 1000 integers between 1,000,000 and
20,000,000, and we'll try to identify if they are prime numbers by brute force. Essen-
tially, we will divide (using a modulo operation) each number by every integer from two
up to the square root of the number itself. If the resulting remainder is zero, the number
will be returned as not prime. This solution basically requires thousands of large division
calculations per potential prime. Our first try will be a simple nondistributed solution.
Next, we will demonstrate a solution using iPython's parallel library.
6. www.fsf.org/news/2012-free-software-award-winners-announced-2
 
Search WWH ::




Custom Search