Databases Reference
In-Depth Information
At Square they try to maintain reusability and readability by struc‐
turing code in different folders with distinct, reusable components that
provide semantics around the different parts of building a machine
learning model:
Model
The learning algorithms
Signal
Data ingestion and feature computation
Error
Performance estimation
Experiment
Scripts for exploratory data analysis and experiments
Test
Test all the things
They only write scripts in the experiments folder where they either tie
together components from model, signal, and error, or conduct ex‐
ploratory data analysis. Each time they write a script, it's more than
just a piece of code waiting to rot. It's an experiment that is revisited
over and over again to generate insight.
What does such a discipline give you? Every time you run an experi‐
ment, you should incrementally increase your knowledge. If that's not
happening, the experiment is not useful. This discipline helps you
make sure you don't do the same work again. Without it you can't even
figure out the things you or someone else has already attempted. Ian
further claims that “If you don't write production code, then you're
not productive.”
For more on what every project directory should contain, see Project
Template by John Myles White. For those students who are using R
for their classes, Ian suggests exploring and actively reading Github's
repository of R code. He says to try writing your own R package, and
make sure to read Hadley Wickham's devtools wiki . Also, he says that
developing an aesthetic sense for code is analogous to acquiring the
taste for beautiful proofs; it's done through rigorous practice and feed‐
back from peers and mentors.
For extra credit, Ian suggests that you contrast the implementations
of the caret package with scikit-learn . Which one is more extendable
and reusable? Why?
Search WWH ::




Custom Search