The Future: Trends in Data Technology - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

for many years. Others claim that software tools will mature to the point of making

many of the most technical tasks of this work completely automated.

Some make the analogy that the role of the data scientist of today is a bit like that

of the “Webmaster” role of the late 1990s 4 : a person who had an adequate under-

standing of the collection of technologies necessary to create and maintain a Web

site. These skills included a bit of server administration, a smattering of design, some

knowledge of JavaScript and Perl, and enough familiarity with HTML to know that

the <blink> and <marquee> tags should only be used ironically. Decades later,

it's clear that for many professional Web development jobs, specialized knowledge

is required. An established Web startup may employ a graphic designer, a frontend

developer, a systems developer, and even a user-experience researcher. Developers who

work on frontend and mobile versions of an application require different skills and

knowledge of different toolsets than those who are tasked with maintaining the health

of the application's backend systems. Many applications are now built on fully man-

aged systems such as Amazon Web Services, Rackspace, or platforms such as Heroku.

The tasks given to the Webmaster were improved by adding specialists, whereas others

became automated thanks to utility computing.

I am skeptical of the longevity of a do-it-all data-scientist role. Certainly many

professional statisticians are great programmers—and vice versa—but I have a hard

time believing that the ideal state is somewhere in between. Developers want to write

code, and statisticians are passionate about statistics. Many of the pain points around

large-scale data projects happen because the software for different parts of a data pipe-

line is simply not well connected, or the data itself is not normalized. People interested

in asking questions about data may be programmers, or statisticians may learn some

development skills to get their work done. Sometimes the most accessible solution to a

data problem requires mangling … I mean, managing a collection of different technolo-

gies, but I see all of these as temporary edge cases.

I think the reality is that roles in which we were specialized before the era of

MapReduce will continue to be very similar. In many ways, administrating clusters of

nodes and writing glue software to connect big data pipes can be fun, but ultimately

these tasks will fall into the realm of automation brought on by economies of scale.

Innovations will trend toward interoperability, and some of the tricky transformation

and collection tasks undertaken by present day data scientists will become more and

more invisible.

A better way to look at the highly innovative space of data technologies is to think

about which skill sets in the field are the most susceptible to disruption. Disruptive

technologies often find footholds in industries in which one aspect is “good enough”

to be useful and provides a collection of enormous benefits that incumbents cannot

provide. The same might be said of professional roles. A statistician or math specialist

may be able to provide just enough programming know-how to get a task done.

scientist.aspx

Search WWH ::

Custom Search

Home