Database Reference
In-Depth Information
bottleneck in these systems, latency may be reduced by having the machines in close
proximity to one another. This means we need to have direct control of the com-
puter hardware itself to solve our problem. We will also need space, plenty of power, a
backup power supply, security, and cooling systems. In other words, we need to build
and maintain our own data center.
Or do we? Computing is on its way to becoming a utility, and in the future, a lot
of the computing resources we consume will be available in much the same way as
water and power: metered service right out of the tap. For many software applica-
tions, most of the heavy lifting will take place on platforms or virtual machines with
the bulk of processing taking place far away in large data centers. This trend is already
very visible on the Web and with mobile applications. From Yelp to Netf lix to your
favorite social games, how many apps on your smartphone are essentially just interfaces
to cloud-based services?
Unfortunately, many hurdles must be overcome before the cloud can become the
de facto home of data processing. A common mantra of large data processing is to
make sure that processing takes place as close to the data as possible. This concept is
what makes the design of Hadoop so attractive; data is generally distributed across
server nodes from which processing takes place. In order to use cloud systems for the
processing of large amounts of in-house data, data would need to be moved using the
relatively small bandwidth of the Internet. Similarly, data generated by an application
hosted on one cloud provider might need to be moved to another cloud service for
processing. These steps take time and reduce the overall performance of the system in
comparison to a solution in which the data is accessible in a single place. Most impor-
tantly, there are a range of security, compliance, and regulatory concerns that need to
be addressed when moving data from one place to another.
The disadvantages of using a public cloud include an inability to make changes to
the infrastructure. Also, the loss of control might even result in greater costs overall.
Maintaining hardware can also provide some f lexibility in wringing every last bit of
performance from the system. For some applications, this might be a major concern.
Currently, it's possible to access cloud computing resources that are maintained in
off-site data centers as a service. These are sometimes referred to using the slightly
misleading term private clouds . It is also possible to lease dedicated servers in data cen-
ters that don't share hardware with other customers. These private clouds can often
provide more control over the underlying hardware, leading to the potential for higher
performance data processing.
A potential advantage of not dealing with physical infrastructure is that more time
can be devoted to data analysis. If your company is building a Web application, why
divert engineering resources to dealing with all the administrative overhead necessary
to administer the security and networking needed to run a cluster of computers? In
reality, depending on the type of application being built, managing clusters of virtual
server instances in the cloud might be just as time consuming as managing physical
hardware. To truly avoid the overhead of managing infrastructure, the best solution is
to use data processing as a service tools (discussed later in this chapter).
 
Search WWH ::




Custom Search