Databases Reference
In-Depth Information
general US population), whose situation was not so bad that they didn't
have time to tweet.
Note, too, that in this case, if you didn't have context and know about
Hurricane Sandy, you wouldn't know enough to interpret this data
properly.
Sampling
Let's rethink what the population and the sample are in various
contexts.
In statistics we often model the relationship between a population and
a sample with an underlying mathematical process. So we make sim‐
plifying assumptions about the underlying truth, the mathematical
structure, and shape of the underlying generative process that created
the data. We observe only one particular realization of that generative
process, which is that sample.
So if we think of all the emails at BigCorp as the population, and if we
randomly sample from that population by reading some but not all
emails, then that sampling process would create one particular sample.
However, if we resampled we'd get a different set of observations.
The uncertainty created by such a sampling process has a name: the
sampling distribution . But like that 2010 movie Inception with Leo‐
nardo DiCaprio, where he's in a dream within a dream within a dream,
it's possible to instead think of the complete corpus of emails at Big‐
Corp as not the population but as a sample.
This set of emails (and here is where we're getting philosophical, but
that's what this is all about) could actually be only one single realization
from some larger super-population , and if the Great Coin Tosser in the
sky had spun again that day, a different set of emails would have been
observed.
In this interpretation, we treat this set of emails as a sample that we
are using to make inferences about the underlying generative process
that is the email writing habits of all the employees at BigCorp.
Search WWH ::




Custom Search