Eric Jonas - Data Scientists at Work

Database Reference

In-Depth Information

The other thing I think people are going to be really surprised by is how much

of a quantitative and computational science the life sciences will become. In

some sense, everyone's always saying this—it's kind of a trope at this point,

but it's only going to become increasingly true. Every time we look back,

we're much better than we were five years ago. We always still hate ourselves

though, because we're never where we want to be—but I think we'll get

there.

Gutierrez: What is something someone starting out should try to under-

stand deeply?

Jonas: They should understand probability theory forwards and backwards.

I'm at the point now where everything else I learn, I then map back into

probability theory. It's great because it provides this amazing, deep, rich basis

set along which I can project everything else out there. There's a book by E. T.

Jaynes called Probability Theory: The Logic of Science , and it's our bible. 8 We really

buy it in some sense. The reason I like the probabilistic generative approach

is you have these two orthogonal axes—the modeling axis and the inference

axis. Which basically translates into how do I express my problem and how

do I compute the probability of my hypothesis given the data? The nice thing

I like from this Bayesian perspective is that you can engineer along each of

these axes independently. Of course, they're not perfectly independent, but

they can be close enough to independent that you can treat them that way.

When I look at things like deep learning or any kind of LASSO-based linear

regression systems, which is so much of what counts as machine learning

these days, they're engineering along either one axis or the other. They've kind

of collapsed that down. Using these LASSO-based techniques as an engineer, it

becomes very hard for me to think about: “If I change this parameter slightly,

what does that really mean?” Linear regression as a model has a very clear

linear additive Gaussian model baked into it. Well, what if I want things to

look different? Suddenly all of these regularized least squares things fall apart.

The inference technology just doesn't even accept that as a thing you'd want

to do.

The reason my entire team and I fell in love with the probabilistic generative

approach was that we could rationally engineer in an intelligent way with it.

We could independently think about how make the model better or how

to solve the inference problem. A lot of times you'll find that by making

the model better—that is by moving along the modeling axis—that infer-

ence actually becomes easier, because you're more able to capture interesting

structure in your data.

8 E. T. Jaynes, Probability Theory: The Logic of Science (Cambridge University Press, 2003).

Search WWH ::

Custom Search

Home