Database Reference
In-Depth Information
for many years. Others claim that software tools will mature to the point of making
many of the most technical tasks of this work completely automated.
Some make the analogy that the role of the data scientist of today is a bit like that
of the “Webmaster” role of the late 1990s 4 : a person who had an adequate under-
standing of the collection of technologies necessary to create and maintain a Web
site. These skills included a bit of server administration, a smattering of design, some
knowledge of JavaScript and Perl, and enough familiarity with HTML to know that
the <blink> and <marquee> tags should only be used ironically. Decades later,
it's clear that for many professional Web development jobs, specialized knowledge
is required. An established Web startup may employ a graphic designer, a frontend
developer, a systems developer, and even a user-experience researcher. Developers who
work on frontend and mobile versions of an application require different skills and
knowledge of different toolsets than those who are tasked with maintaining the health
of the application's backend systems. Many applications are now built on fully man-
aged systems such as Amazon Web Services, Rackspace, or platforms such as Heroku.
The tasks given to the Webmaster were improved by adding specialists, whereas others
became automated thanks to utility computing.
I am skeptical of the longevity of a do-it-all data-scientist role. Certainly many
professional statisticians are great programmers—and vice versa—but I have a hard
time believing that the ideal state is somewhere in between. Developers want to write
code, and statisticians are passionate about statistics. Many of the pain points around
large-scale data projects happen because the software for different parts of a data pipe-
line is simply not well connected, or the data itself is not normalized. People interested
in asking questions about data may be programmers, or statisticians may learn some
development skills to get their work done. Sometimes the most accessible solution to a
data problem requires mangling … I mean, managing a collection of different technolo-
gies, but I see all of these as temporary edge cases.
I think the reality is that roles in which we were specialized before the era of
MapReduce will continue to be very similar. In many ways, administrating clusters of
nodes and writing glue software to connect big data pipes can be fun, but ultimately
these tasks will fall into the realm of automation brought on by economies of scale.
Innovations will trend toward interoperability, and some of the tricky transformation
and collection tasks undertaken by present day data scientists will become more and
more invisible.
A better way to look at the highly innovative space of data technologies is to think
about which skill sets in the field are the most susceptible to disruption. Disruptive
technologies often find footholds in industries in which one aspect is “good enough”
to be useful and provides a collection of enormous benefits that incumbents cannot
provide. The same might be said of professional roles. A statistician or math specialist
may be able to provide just enough programming know-how to get a task done.
4. http://blogs.msdn.com/b/microsoftenterpriseinsight/archive/2013/01/31/what-is-a-data-
scientist.aspx
 
Search WWH ::




Custom Search