On the Java landscape, this problem is even more pronounced because of Java's difficulty in
addressing large amounts of RAM (anecdotally, 2GB to 4GB is about the max a single JVM can usefully
address). There are garbage collectors in the works that seek to fix some of these issues, but the fact
remains that a single computer can have far more RAM than a single JVM could ever usefully deal with.
Parallelization is a must. Today, more and more enterprises are deploying entire virtualized operating
system stacks on one server simply to isolate Java applications and fully exploit the hardware.
Thus, distribution isn't just a function of resilience or capability; it's a function of common-sense
There are costs to parallelization, as well. There's always going to be some constraint, and very
rarely is an entire system equally scalable. The cost of coordinating state between nodes, for example,
might be too high because the network or hard disks impose latency. There are also other constraints.
Notably, not all operations are parallelizable. It's important to design systems with this in mind. An
example might be the overall processing of a person's uploaded photos (as happens in many web sites
today). You might take the moment at which they upload the batch, to the moment a process has
watermarked them and added them to an online photo album and measure the time during which the
whole process is executed serially. Some of these steps are not parallelizable. The one part that is, the
watermarking, will only lead to a fixed increase, and little can be done beyond that.
You can describe these gains. Amdahl's law, also known as Amdahl's argument, is a formula to find
the maximum expected improvement to an overall system when only part of the system is improved. It
is shown here:
It describes the relationship between a solutions execution time when serially executed and when
executed in parallel with the same problem set. Thus, for 90 photos, if we know that it takes a minute for
each photo, and that uploading takes 5 minutes, and that posting the resulting photos to the repository
takes 5 minutes, the total time is 100 minutes when executed serially. Let's assume we add 9 workers to
the watermarking process, for a total of 10 processes that watermark. In the equation, P is the portion of
the process that can be parallelized, and N is the factor by which that portion might be parallelized (that
is, the number of workers, in this case). For the process described, 90% of the process can be
parallelized: each photo could be given to a different worker, which means it's parallelizable, which
means that 90% of the serial execution is parallelizable. If you have 10 nodes working together, the
equation is: 1/((1-.9) + (.9 / 10)), or 5.263. So, with 10 workers, the process could be 5x faster. With
100 workers, the equation yields 9.174, or 9x faster. It may not make sense to continue adding nodes as
you'll achieve increasingly smaller gains.
Building an effective distributed solution, then, is an application of cost/benefit analysis. Spring has
no direct support for distributed paradigms, per se , because plenty of other solutions do a great job
already. Often, these solutions make Spring integration a first priority because it's a de-facto standard.
In some cases, these projects forwent their own configuration format and use Spring itself as the
configuration mechanism. If you decide to employ distribution, you'll be glad to know that there are
many projects designed to meet the call, whatever it may be.
In this chapter, we discuss a few solutions that are Spring-friendly and ready. A lot of these solutions
are possible because of Spring's support for “components,” such as it's XML schema support and
runtime class detection. These technologies often require you to change your frame of mind when
building solutions, even if ever so slightly, as compared to solutions built using JEE, but being able to
rely on your Spring skills is powerful. Other times, these solutions may not even be visible, except as
configuration. Further still, a lot of these solutions expose themselves as standard interfaces familiar to
JEE developers, or as infrastructure (such as, for example, backing for an HTTP session, or as a cluster-
ready message queue) that goes unnoticed and isolated, except at the configuration level, thanks to
Spring's dependency injection.