Big Data and Cloud Culture - To the Cloud: Big Data in a Turbulent World

Database Reference

In-Depth Information

between two variables either was created by one or more other variables

or the correlations themselves were trivial. Rather than ind a needle in a

haystack, big data, as Nasim Talib (2012) and David Brooks (2013) have

perceptively noted, often just leads to more haystacks. As Brooks (2013)

put it, “As we acquire more data, we have the ability to ind many, many

more statistically signiicant correlations. Most of these correlations are

spurious and deceive us when we're trying to understand a situation. Falsity

grows exponentially the more data we collect. The haystack gets bigger,

but the needle we are looking for is still buried deep inside.”

Two of the best means of addressing a mass of correlations, most of

which are spurious or trivial, employ strategies that tend to be ignored by

big data, particularly by its biggest boosters: theory and history. Theory

is the explanatory story that makes the most sense of the data. No story

makes perfect sense because the complexity of the data and the world it

represents can only be perfectly theorized by an explanation that is so

general that it ceases to be useful. Rather, the goal is to ind a theory that is

both grounded in the data and makes reasonable sense. Some would argue

that this requires the inclusion of another concept routinely eschewed by

big-data enthusiasts: causality. It makes more sense to test data against a

causal model than to expect data, however large and diverse the collection,

to speak for itself. In fact, it is doubtful that the latter is possible because,

in or outside the cloud, data is not an entity independent of human con-

ception or contamination, but is created through human intelligence and

purpose, with all of their limitations and biases. Nevertheless, the choice is

not between causal theory or no theory at all. An intermediate position is

built upon mutual constitution, which maintains that concepts and data,

theory and evidence, construct or mutually constitute one another in an

ongoing process of building an argument. Arguments are then tested

against new data and alternative arguments.

There are other ways to constitute theory, but the point is that research

of any consequence, including studies using large data sets, cannot do

away with it. That is because the concepts expressed in the data presume

a theoretical perspective. As Brooks explained, “data is never raw; it's

always structured according to somebody's predispositions and values.

The end result looks disinterested, but, in reality, there are value choices

all the way through, from construction to interpretation” (ibid.). It may

be ambiguous or clear, weak or strong, but by virtue of our naming what

Search WWH ::

Custom Search

Home