Database Reference
In-Depth Information
been doing algebraic topology for a long time, and we're going to then teach
them quantitative finance, and this is going to be a good scheme.” In some
sense, it obviously worked out very well for them, but especially on the data
side, data analysis is so much messier than actual math. I have friends who
work on these topology-based approaches, and I'm like, “You realize these
manifolds totally evaporate when you actually throw noise into the system.
How do you think this is really going to play out here?” So I would much
rather someone be computationally skilled. I'm willing to trade off what
their Putnam score was for how many open source GitHub projects they've
committed to in the past.
I'm also very skeptical of this notion where a data scientist comes in without
the domain knowledge and starts producing work. I think you actually need
to care about the domain. I do think that a lot of the interesting problems,
especially those I'm interested in, necessitate that you have already been doing
work in the area for a while. So rarely do I find myself hiring someone who
just has data science experience.
One of the things I've seen a lot in the neuroscience community—or in
industry even—is that you get people who really like math showing up and
being like, “How can we apply this thing I have to your problem?” They just
want to do the math and they don't really care about the application. But if
you don't actually care about the underlying problem, then you're not going to
be willing to make the compromises necessary to understand how to guide
your own work. In academics or industry, if you're not actually speaking in
a language that your customers understand, then you will have a nice time
talking, but no one will really listen to you.
Gutierrez: What is something you know that you think people will be wowed
by five years from now?
Jonas: Either that Bayesian nonparametric models let you see things in data
that you didn't know were there or that Markov chain Monte Carlo actually
scales to data at a size you care about. Being properly probabilistic solves
so many of the problems we face in machine learning, like overfitting and
complicated transform issues that I still don't fully understand. There's an
entire set of machine learning work that starts with the predicate that your
data are a fully observed, real-valued matrix where the matrix is R n ´ m . From
my point of view, problems almost never look like that. This predicate forces
you to do all this stuff with your data to try and force it to look like that. And
then, once you have it in that form, you do a bunch of linear regressions. I'm
of the opinion that it's better to do slightly more sophisticated modeling here
by modeling the likelihood function and taking a generative approach. I think
that in five years, that's going to be the way most people do things. I think it's
inevitable. However, I think it's going to be a lot of work to get there.
 
Search WWH ::




Custom Search