Database Reference
In-Depth Information
The dating spreadsheet has become somewhat of a joke now, but it actually
really helped. Everyone talks about “quantified self” and everyone wants to
track themselves. But no one's writing down a lot of the interpersonal inter-
actions that actually matter—who cares about how many steps you took last
week, who did you kiss? So I think the dating spreadsheet is a good argument
for the quantified-self approach in this kind of data.
Gutierrez: What does the future of data science or computational neuro-
biology look like?
Jonas: I know that everyone wants to talk about big data. It's now this phrase
that has somehow entered the lexicon in a horrible sort of way. And also
that being a data scientist is “the sexiest job of the 21st century”—admittedly
said by a data scientist in an article he wrote, so not really objective. Sure
it was in Harvard Business Review …but come on! I actually think a lot of the
future is in small data. Or what my friends at Bitsight call “Grande Data”, as in
the Starbucks cup sizes—it's neither Tall (short) nor Venti (large); it's Grande
(medium). The amount of things you can discover out of a gig of data are
often far more interesting than the things you can discover out of a terabyte
of data, because with a gig of data, you can ask more interesting questions. You
can build more interesting models. You can understand more about what's
going on.
On one hand, there's the Peter Norvig philosophy that with enough data you
can use simple models, which is true if you are Facebook, Google, Walmart,
or companies of that size. Otherwise, most companies have a thousand, or
ten thousand, or even a million customers, which is nowhere near what you
actually need for Norvig's philosophy. Most people who are buying and using
technologies like Hadoop are using it as a recording engine, where they comb
through all this data, then stick it in an RDBMS and actually do their data
analysis in R and SQL. I think that as the big data hype cycle crests, we're going
to see more and more people recognizing that what they really want to be
doing is asking interesting questions of smaller data sets.
On the computational neuroscience side, the data are coming and the data
sets will be getting bigger over the next ten years. Right now, if we're not
building the right models, I think we're going to be a little bit screwed in ten
years. What are we going to do—linear regressions? I've talked to very smart,
famous machine learning people at Google, and I asked them, “What do you
do all day?” And they replied, “Well, you know, we do feature engineering and
then run linear regressions on our largest data.” “But you wrote a book!”
I thought. “What's going on?”
I hate the phrase “predictive analytics.” If you think that the world is all about
predictive analytics, then the entire universe is in some sense solved and
uninteresting. If you care about what's going on inside the box and if you want
technology to let you see new things, then that's kind of a green field right now.
 
Search WWH ::




Custom Search