Database Reference
In-Depth Information
between two variables either was created by one or more other variables
or the correlations themselves were trivial. Rather than ind a needle in a
haystack, big data, as Nasim Talib (2012) and David Brooks (2013) have
perceptively noted, often just leads to more haystacks. As Brooks (2013)
put it, “As we acquire more data, we have the ability to ind many, many
more statistically signiicant correlations. Most of these correlations are
spurious and deceive us when we're trying to understand a situation. Falsity
grows exponentially the more data we collect. The haystack gets bigger,
but the needle we are looking for is still buried deep inside.”
Two of the best means of addressing a mass of correlations, most of
which are spurious or trivial, employ strategies that tend to be ignored by
big data, particularly by its biggest boosters: theory and history. Theory
is the explanatory story that makes the most sense of the data. No story
makes perfect sense because the complexity of the data and the world it
represents can only be perfectly theorized by an explanation that is so
general that it ceases to be useful. Rather, the goal is to ind a theory that is
both grounded in the data and makes reasonable sense. Some would argue
that this requires the inclusion of another concept routinely eschewed by
big-data enthusiasts: causality. It makes more sense to test data against a
causal model than to expect data, however large and diverse the collection,
to speak for itself. In fact, it is doubtful that the latter is possible because,
in or outside the cloud, data is not an entity independent of human con-
ception or contamination, but is created through human intelligence and
purpose, with all of their limitations and biases. Nevertheless, the choice is
not between causal theory or no theory at all. An intermediate position is
built upon mutual constitution, which maintains that concepts and data,
theory and evidence, construct or mutually constitute one another in an
ongoing process of building an argument. Arguments are then tested
against new data and alternative arguments.
There are other ways to constitute theory, but the point is that research
of any consequence, including studies using large data sets, cannot do
away with it. That is because the concepts expressed in the data presume
a theoretical perspective. As Brooks explained, “data is never raw; it's
always structured according to somebody's predispositions and values.
The end result looks disinterested, but, in reality, there are value choices
all the way through, from construction to interpretation” (ibid.). It may
be ambiguous or clear, weak or strong, but by virtue of our naming what
Search WWH ::




Custom Search