Information Technology Reference
In-Depth Information
Another redundancy-based design technique is the use of services such
as clustering (linking many computers together to act as a single faster com-
puter), load balancing (workloads kept balanced between multiple comput-
ers), data replication (making multiple identical copies of data to be processed
independently and in parallel), and protecting complex operations with
transactions to ensure process integrity. Naturally, when one is using cloud
provider services, many of these services are inbuilt in the base infrastruc-
ture and services.
Redundant hardware is one of the most popular strategies for provid-
ing reliable systems. This includes redundant arrays of independent disks
(RAID) for data storage, redundant network interfaces, and redundant power
supplies. With this kind of hardware infrastructure, individual component
failures can occur without affecting the overall reliability of the applica-
tion. It is important to use standardized commodity hardware to allow easy
installation and replacement.
In 2008, Google had to solve the massive search problem across all content
on the Web, which was bordering on one trillion unique URLs. They ended
up employing loosely coupled distributed computing on a massive scale:
clusters of commodity (cheap) computers working in parallel on large data
sets. Even with individual server with excellent reliability statistics, with
hundreds of thousands of servers, there were still multiple failures per day
as one machine or another reached its mean time to failure. Google had no
choice but to give up on reliability of the hardware and switch things over
to achieve the same with the reliability of the software. The only way to
build a reliable system across a group of large number of unreliable comput-
ers is to employ suitable software to address those failures. MapReduce was
the software framework invented by Google to address this issue; the name
MapReduce was inspired by the map and reduce functions of the functional
programming language Lisp.
Parallel programming on a massive scale has the potential to not only
address the issue of reliability but also deliver a huge boost in performance.
This is opportune because, given the problems with large data sets of the
Web, without massive parallelism, leave aside reliability, the processing itself
may not be achievable.
17.1.1 Functional Programming Paradigm
The functional programming paradigm treats computation as the evaluation
of mathematical functions with zero (or minimal) maintenance of states or
data updates. As opposed to procedural programming in languages such
as C or Java, it emphasizes that the application be written completely as
functions that do not save any state. Such functions are called pure func-
tions. This is the first similarity with MapReduce abstraction. All input and
output values are passed as parameters, and the map and reduce functions
are not expected to save state. However, the values can be input and output
Search WWH ::




Custom Search