Database Reference
In-Depth Information
As examples, I've observed experts in the world of star schemas baff led by a lack of
SQL-92 compliance in a cutting-edge analytical database, even while the system was
tearing through terabytes of data. I've also observed Web developers using a sharded
and schemaless MongoDB document store wondering why their analyst wants them to
provide a specific schema that can describe any object in the database. In these debates
over utility, the reality is that everybody is sort of right. The greatest value comes
from systems that are interoperable. The major trend we are seeing is that the differ-
ent camps of entrenched users are peeking over the wall, checking each other out, and
asking questions about how the other systems work.
Summary
Although the state of data-analysis technology is innovation and rapid change, there
are patterns emerging that help elucidate future trends. Clearly the growth of utility
computing as a solution to power Web and mobile applications is driving both user
data and applications that process this data into the cloud. The result is the growth of
new applications that live completely in the cloud. Not only are these types of applica-
tions generating a great deal of data, but the tools that can process this data are being
built with utility-computing resources as well.
The Apache Hadoop project has become synonymous with tech hype around Big
Data solutions. With a huge user base and a growing collection of corporate shepherds,
the Hadoop ecosystem will obviously grow and evolve to become more enterprise
friendly and interoperable with existing business-analytics tools. However, tools built
on top of the Hadoop ecosystem are not always the best solution to data challenges.
The appearance of gaps exposed when trying to use a MapReduce framework for
some use cases has meant that the spotlight is beginning to be directed to other types
of data tools. The growth of new distributed analytical databases is a good example of
the type of non-MapReduce technology that is being added to the mainstream data
landscape. These insights are also helping people understand best practices around
when to use MapReduce and when to stick with traditional tools such as relational
databases and spreadsheets.
The role of developers, analysts, and systems administrators who work in this space
will certainly change as well. For a short time, it appears that there will be demand
for growth in the profession of the do-it-all data scientist—a role that includes being
able to both ask the right questions and have the technical skills to navigate the some-
times disparate collection of tools necessary to find the answer. Generally, the roles
that require building narratives and telling stories with data will likely remain much as
they were before tools like Hadoop were available. The need for statisticians, applied
mathematicians, model builders, and researchers seems poised not only to grow but
perhaps to become greater than ever. Administrative work involving more technical
aspects of what is currently considered “data science,” such as being able to administer
clusters of virtual servers, is likely moving toward software automation.
 
 
Search WWH ::




Custom Search