Database Reference
In-Depth Information
specifically geared toward developers for building new software but are not the right
fit for analysts. Other tools require a great deal of hardware-infrastructure manage-
ment, whether for virtual or physical machines. This makes for an environment that
is difficult to navigate. People looking for data solutions must accept the reality that
the current state of ambiguity around best practices is normal. Similarly, in the current
landscape, it may not be possible to purchase commercial-software solutions to solve
every use case. In many cases, there will be a need to build parts of a data pipeline.
Large data challenges are often best solved using distributed software that runs on
a cluster of commodity hardware. Examples of this type of technology include the
open-source Apache Hadoop framework and many available distributed databases.
Building an in-house cluster of physical machines can provide good performance per
cost for some applications, but the overall cost may be prohibitive when maintenance
and software administration are included.
For many solutions, it makes more sense to use clusters of virtual servers hosted
by an off-site data center. This can take the form of either a private cloud, featuring
dedicated leased hardware, or a public cloud, which is typically a collection of virtual-
ized servers running on underlying hardware shared by many customers. Both of these
cloud models can be more cost effective than managing physical hardware in house.
The public cloud model is especially useful for organizations that are in the process of
evaluating new software or scaling up capacity. Distributed computing instances can
be grown or shrunk with demand, helping to keep costs manageable. As a rule, avoid
dealing with physical hardware investments whenever possible.
In the IT world, there are many guidelines and best practices for determining when
to make an investment in an existing product. Often dedicated hardware or software
solutions may not even be necessary to solve a particular data problem. In order to
determine whether your organization has the skills necessary to build and maintain
an in-house data solution, start by implementing a small proof-of-concept project. If
the audience for your data solution consists mainly of analysts within your organiza-
tion, look to buy solutions that focus on ease of use and stability. On the other hand, if
solving the data challenge may provide a considerable competitive advantage, consider
focusing first on evaluating the potential for building custom solutions.
Reduce the number of variables necessary to understand the requirements for
building a solution. If your organization is considering building a solution to cope
with a data challenge, it can be effective to scope the evaluation effort using a small
subset of data and a single machine proof-of-concept. Next, consider the potential
challenges that will come from scale. Some software solutions have the ability to be
used at greater and greater scale, but this may take a great deal of engineering effort to
implement.
Overall, the software tools available for collecting, processing, and analyzing large
datasets are in a state of f lux, and best practices and common patterns are still being
developed. As the field matures, look for more examples of data solutions being offered
as hosted services.
 
Search WWH ::




Custom Search