When to Build, When to Buy, When to Outsource - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

specifically geared toward developers for building new software but are not the right

fit for analysts. Other tools require a great deal of hardware-infrastructure manage-

ment, whether for virtual or physical machines. This makes for an environment that

is difficult to navigate. People looking for data solutions must accept the reality that

the current state of ambiguity around best practices is normal. Similarly, in the current

landscape, it may not be possible to purchase commercial-software solutions to solve

every use case. In many cases, there will be a need to build parts of a data pipeline.

Large data challenges are often best solved using distributed software that runs on

a cluster of commodity hardware. Examples of this type of technology include the

open-source Apache Hadoop framework and many available distributed databases.

Building an in-house cluster of physical machines can provide good performance per

cost for some applications, but the overall cost may be prohibitive when maintenance

and software administration are included.

For many solutions, it makes more sense to use clusters of virtual servers hosted

by an off-site data center. This can take the form of either a private cloud, featuring

dedicated leased hardware, or a public cloud, which is typically a collection of virtual-

ized servers running on underlying hardware shared by many customers. Both of these

cloud models can be more cost effective than managing physical hardware in house.

The public cloud model is especially useful for organizations that are in the process of

evaluating new software or scaling up capacity. Distributed computing instances can

be grown or shrunk with demand, helping to keep costs manageable. As a rule, avoid

dealing with physical hardware investments whenever possible.

In the IT world, there are many guidelines and best practices for determining when

to make an investment in an existing product. Often dedicated hardware or software

solutions may not even be necessary to solve a particular data problem. In order to

determine whether your organization has the skills necessary to build and maintain

an in-house data solution, start by implementing a small proof-of-concept project. If

the audience for your data solution consists mainly of analysts within your organiza-

tion, look to buy solutions that focus on ease of use and stability. On the other hand, if

solving the data challenge may provide a considerable competitive advantage, consider

focusing first on evaluating the potential for building custom solutions.

Reduce the number of variables necessary to understand the requirements for

building a solution. If your organization is considering building a solution to cope

with a data challenge, it can be effective to scope the evaluation effort using a small

subset of data and a single machine proof-of-concept. Next, consider the potential

challenges that will come from scale. Some software solutions have the ability to be

used at greater and greater scale, but this may take a great deal of engineering effort to

implement.

Overall, the software tools available for collecting, processing, and analyzing large

datasets are in a state of f lux, and best practices and common patterns are still being

developed. As the field matures, look for more examples of data solutions being offered

as hosted services.

Search WWH ::

Custom Search

Home