Databases Reference
In-Depth Information
blueprints and three-dimensional, scaled-down versions. Molecular
biologists capture protein structure with three-dimensional visuali‐
zations of the connections between amino acids. Statisticians and data
scientists capture the uncertainty and randomness of data-generating
processes with mathematical functions that express the shape and
structure of the data itself.
A model is our attempt to understand and represent the nature of
reality through a particular lens, be it architectural, biological, or
mathematical.
A model is an artificial construction where all extraneous detail has
been removed or abstracted. Attention must always be paid to these
abstracted details after a model has been analyzed to see what might
have been overlooked.
In the case of proteins, a model of the protein backbone with side-
chains by itself is removed from the laws of quantum mechanics that
govern the behavior of the electrons, which ultimately dictate the
structure and actions of proteins. In the case of a statistical model, we
may have mistakenly excluded key variables, included irrelevant ones,
or assumed a mathematical structure divorced from reality.
Statistical modeling
Before you get too involved with the data and start coding, it's useful
to draw a picture of what you think the underlying process might be
with your model. What comes first? What influences what? What
causes what? What's a test of that?
But different people think in different ways. Some prefer to express
these kinds of relationships in terms of math. The mathematical ex‐
pressions will be general enough that they have to include parameters,
but the values of these parameters are not yet known.
In mathematical expressions, the convention is to use Greek letters for
parameters and Latin letters for data. So, for example, if you have two
columns of data, x and y , and you think there's a linear relationship,
you'd write down y = β 0 + β 1 x . You don't know what β 0 and β 1 are in
terms of actual numbers yet, so they're the parameters.
Other people prefer pictures and will first draw a diagram of data flow,
possibly with arrows, showing how things affect other things or what
happens over time. This gives them an abstract picture of the rela‐
tionships before choosing equations to express them.
Search WWH ::




Custom Search