Databases Reference
In-Depth Information
Josh primed his topic with a thought experiment first.
Thought Experiment
How would you build a human-powered airplane? What would you
do? How would you form a team?
Maybe you'd run an X prize competition. This is exactly what some
people did, for $50,000, in 1950. It took 10 years for someone to win
it. The story of the winner is useful because it illustrates that sometimes
you are solving the wrong problem.
Namely, the first few teams spent years planning, and then their planes
crashed within seconds. The winning team changed the question to:
how do you build an airplane you can put back together in four hours
after a crash? After quickly iterating through multiple prototypes, they
solved this problem in six months.
On Being a Data Scientist
Josh had some observations about the job of a data scientist. A data
scientist spends all their time doing data cleaning and preparation—
a full 90% of the work is this kind of data engineering. When deciding
between solving problems and finding insights, a data scientist solves
problems. A bit more on that: start with a problem, and make sure you
have something to optimize against. Parallelize everything you do.
It's good to be smart, but being able to learn fast is even better: run
experiments quickly to learn quickly.
Data Abundance Versus Data Scarcity
Most people think in terms of scarcity. They are trying to be conser‐
vative, so they throw stuff away. Josh keeps everything. He's a fan of
reproducible research, so he wants to be able to rerun any phase of his
analysis. He keeps everything. This is great for two reasons. First, when
he makes a mistake, he doesn't have to restart everything. Second,
when he gets new sources of data, it's easy to integrate them in the
point of the flow where it makes sense.
Search WWH ::




Custom Search