Database Reference
In-Depth Information
example, that out of one hundred samples, the relationship would show up
ninety-ive times—give no warrant to assert causality and to rule out the
possibility of a spurious relationship. Correlations help one to determine
which among a group of variables go together, or covary, and to rule
out with some conidence those that do not. But people often mistake
this for providing evidence of causality or of certainty that they are tied
together, independent of other variables that may very well be essential.
For example, just because the sale of umbrellas is highly correlated with
car accidents does not mean that one causes the other. Rather, it is the
presence of a third variable, rain, that inluences both. In this case the
relationship between umbrella sales and accidents is spurious.
Big-data analysis also tends to be atheoretical . In fact, major proponents
boast that it frees people from coming up with hypotheses or theories
to be tested and allows the data to speak for itself (Anderson 2008).
Not every proponent of big data holds as strongly to this view, but most
accept that, given our ability to measure and monitor behavior, from the
“likes” posted on Facebook to how fast we drive, the goal of science
should be to apply mathematical procedures, such as correlations, and let
generalizations emerge from the data. The point, as Mayer-Schönberger
and Cukier emphasize, is that “no longer do we necessarily require a
valid substantive hypothesis about a phenomenon to begin to understand
our world” (2013, 55). Theory's guiding hand was necessary in the past
because there was not enough data to rely on it alone to provide answers.
A world awash in data can now ind, in the analogy often used by big-
data supporters, a needle in a haystack (Singh 2013). Replacing theories
and hypothesis are general areas of interest and speciic questions that
the researcher believes big data and the cloud might answer. Anything
more rigorous would prematurely rule out entire areas where solutions
might be found.
The primary goal of big data is to be predictive . Find patterns deep
in the data and expect that, barring signiicant structural changes, they
will tell us what the future will be like. Determining why is less impor-
tant than predicting what will be . As a 2013 overview concludes, “We're
entering a world of constant data-driven predictions where we may not
be able to explain the reasons behind our decisions” (Mayer-Schönberger
and Cukier 2013, 17). Consider the example of Google's search for the
needle of insight into the spread of lu, a goal that has eluded experts at
Search WWH ::




Custom Search