Database Reference
In-Depth Information
already built a solution. Because of the rapid innovation and experimentation in the
data-technology field, the idea of sticking to core competencies is not always so cut
and dry. In the case of Big Data technology, there hasn't been much time for industry
best practices to be fully f leshed out, and early adopters are still coming up with suc-
cess stories. If data technology that you are evaluating is a core aspect of your organi-
zation that will help provide differentiation, then building solutions in-house might
be the right choice. In contrast, organizations that care more about data technology
getting out of the way should start by attempting to purchase a solution rather than
devoting time to building one.
Some commercial software vendors provide large, scalable databases with support
personnel, hardware, and training for a premium fee. However, if you know that all
you need to do is process a large amount of last year's sales data for a year-end report,
it might be more reasonable to stay f lexible and build a solution using available open-
source technologies. If you have determined that you only need to collect some data
and query it quickly, then it is probably a sign that you should be looking at a solution
built with an open-source technology such as Hadoop and Hive, or Spark and Shark.
In my experience, there are three major considerations when trying to determine
whether to build your own solution or buy. The first is the most obvious: What is the
cost of solving the problem? This factor is complex; the cost of maintaining software
can be hard to predict, and the ability to execute on the solution is highly dependent
on organizational personnel. Another factor is how to deal with future scalability.
How will the solution you develop change as data volumes grow? Will the system
need to be completely rebuilt if either data volume or throughput changes? Finally,
there is understanding the audience: For whom is the solution ultimately being devel-
oped? Consider an organization trying to analyze a large amount of internal data.
Later on, if the same organization needs to provide external access to some of this data
for its customers, it may need to deploy a Web-based dashboard or an API. The tech-
nologies needed to provide a solution for this new audience may require a completely
different set of technologies.
A Playbook for the Build versus Buy Problem
Some of the concepts and technologies of distributed data systems are still in their
early stages of adoption. There is not yet a huge amount of literature that describes
popularly accepted best practices for making decisions about data technologies. Fur-
thermore, many of the innovative technologies that have been around the longest, such
as the Apache Hadoop project, are already starting to see disruptive competition from
newer frameworks such as Spark.
There are many ways you can approach the problem of building solutions versus
buying. In the currently murky world of large-scale data challenges, there are some
patterns that I have observed that work well to help navigate the process of evaluating
build versus buy scenarios. First, evaluate your current investments in data technolo-
gies and infrastructure with a particular understanding of the personnel and culture
of your organization. Next, gain some insight into actual pain points by investing in
 
 
Search WWH ::




Custom Search