The Future: Trends in Data Technology - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

As examples, I've observed experts in the world of star schemas baff led by a lack of

SQL-92 compliance in a cutting-edge analytical database, even while the system was

tearing through terabytes of data. I've also observed Web developers using a sharded

and schemaless MongoDB document store wondering why their analyst wants them to

provide a specific schema that can describe any object in the database. In these debates

over utility, the reality is that everybody is sort of right. The greatest value comes

from systems that are interoperable. The major trend we are seeing is that the differ-

ent camps of entrenched users are peeking over the wall, checking each other out, and

asking questions about how the other systems work.

Summary

Although the state of data-analysis technology is innovation and rapid change, there

are patterns emerging that help elucidate future trends. Clearly the growth of utility

computing as a solution to power Web and mobile applications is driving both user

data and applications that process this data into the cloud. The result is the growth of

new applications that live completely in the cloud. Not only are these types of applica-

tions generating a great deal of data, but the tools that can process this data are being

built with utility-computing resources as well.

The Apache Hadoop project has become synonymous with tech hype around Big

Data solutions. With a huge user base and a growing collection of corporate shepherds,

the Hadoop ecosystem will obviously grow and evolve to become more enterprise

friendly and interoperable with existing business-analytics tools. However, tools built

on top of the Hadoop ecosystem are not always the best solution to data challenges.

The appearance of gaps exposed when trying to use a MapReduce framework for

some use cases has meant that the spotlight is beginning to be directed to other types

of data tools. The growth of new distributed analytical databases is a good example of

the type of non-MapReduce technology that is being added to the mainstream data

landscape. These insights are also helping people understand best practices around

when to use MapReduce and when to stick with traditional tools such as relational

databases and spreadsheets.

The role of developers, analysts, and systems administrators who work in this space

will certainly change as well. For a short time, it appears that there will be demand

for growth in the profession of the do-it-all data scientist—a role that includes being

able to both ask the right questions and have the technical skills to navigate the some-

times disparate collection of tools necessary to find the answer. Generally, the roles

that require building narratives and telling stories with data will likely remain much as

they were before tools like Hadoop were available. The need for statisticians, applied

mathematicians, model builders, and researchers seems poised not only to grow but

perhaps to become greater than ever. Administrative work involving more technical

aspects of what is currently considered “data science,” such as being able to administer

clusters of virtual servers, is likely moving toward software automation.

Search WWH ::

Custom Search

Home