Database Reference
In-Depth Information
Jonas: If I'm not familiar with data, then I generally don't even start. I recently
met Winfried Denk, who invented two-photon microscopy and is a very smart
applied-physicist guy who's received many, many, many awards. His comment
to me in this area was that the number-one thing you have to be able to do
is actually know what questions to ask. And so I try not to get involved in
projects where I don't know what the right questions are. And then gener-
ally, if I know the questions, I understand the data well enough to then start
thinking about the modeling. The nice thing about modeling is that you can
fairly rapidly turn around and try a bunch of different things. But if you haven't
even looked at the data and done the most basic things, then it's very easy to
be led astray.
Gutierrez: How do you look at the data?
Jonas: Matplotlib in Python. I make a bunch of initial plots and then play
around with the data. A lot of the data I work with looks very different from
the kinds of data that show up more on the industry side of things. No one in
science really uses a relational database, because we either have time series,
or graphs, or images, or all these weird things. Rarely do we get relational
facts. So I don't end up using SQL that much. It's much more about writing a
bunch of custom scripts to parse through 100 gigabytes of time-series data
and look at different spectral bands or something similar.
Gutierrez: What do you look for in other people's work?
Jonas: On the research side, my answer is different from many of the people
I work with and other people in the field. One of my colleagues told me, that
I read more papers than anyone they know. I don't actually really read most of
the papers. I read the title and the abstract, look at the figures, and then move
on. For example, when I evaluate machine learning papers, what I am looking
to find out is whether the technique worked or not. This is something that
the world needs to know—most papers don't actually tell you whether the
thing worked. It's really infuriating because most papers will show five dataset
examples and then show that they're slightly better on two different metrics
when comparing against something from 20 years ago. In academia, it's fine. In
industry, it's infuriating, because you need to know what actually works and
what doesn't.
So a lot of what I look for are: “Do I think that their approach was valid? Do I
know them?” The degree to which I will read papers from people I know and
trust far is far higher than those whom I don't know. People complain that
it's hard for new people to break into fields. Well, that's partly because at any
given time, 99 percent of the time people are all new and they're cranks. So a
lot of it is: “Do I find the structure of this model to be interesting? Do I think
they did inference properly? Did they ask the basic questions? Do I believe
those results? Is the answer something that I would have believed before
 
Search WWH ::




Custom Search