The Students Speak - Doing Data Science

Databases Reference

In-Depth Information

that practicing data science is inherently a collective endeavor. In the

beginning of the course, Rachel showed us a hub-and-spoke network

diagram. She had brought us all together and so was at the center. The

spokes connected each of us to her. It became her hope that new

friendships/ideas/projects/connections would form during the

course.

It's perhaps more important in an emergent field than in any other to

be part of a community. For data science in particular, it's not just useful

to your career—it's essential to your practice. If you don't read the

blogs, or follow people on Twitter, or attend meetups, how can you

find out about the latest distributed computing software, or a refuta‐

tion of the statistical approach of a prominent article? The community

is so tight-knit that when Cathy was speaking about MapReduce at a

meetup in April, she was able to refer a question to an audience mem‐

ber, Nick Avteniev—easy and immediate references to the experts of

the field is the norm. Data science's body of knowledge is changing

and distributed, to the extent that the only way of finding out what

you should know is by looking at what other people know. Having a

bunch of different lecturers kickstarted this process for us. All of them

answered our questions. All gave us their email addresses. Some even

gave us jobs.

Having listened to and conversed with these experts, we formed more

questions. How can we create a time series object in R? Why do we

keep getting errors in our plotting of our confusion matrix? What the

heck is a random forest? Here, we not only looked to our fellow stu‐

dents for answers, but we went to online social communities such as

Stack Overflow, Google Groups, and R bloggers. It turns out that there

is a rich support community out there for budding data scientists like

us trying to make our code run. And we weren't just getting answers

from others who had run into the same problems before us. No, these

questions were being answered by the pioneers of the methods. People

like Hadley Wickham, Wes McKinney, and Mike Bostock were pro‐

viding support for the packages they themselves wrote. Amazing.

Your Mileage May Vary

It's not as if there's some platonic repository of perfect data science

knowledge that you can absorb by osmosis. There are various good

practices from various disciplines, and different vocabularies and in‐

terpretations for the same method (is the regularization parameter a

Search WWH ::

Custom Search

Home