Databases Reference
In-Depth Information
confident it would work. I was betting a lot on it. We had time. We
had resources. We had done what we thought would work, and it still
could have broken. Something could have happened.”
Debate over the value of “domain knowledge” has long polarized the
data community. Much of the promise of unsupervised learning, after
all, is overcoming a crippling dependence on our wonted categories
of social and scientific analysis, as seen in one of many celebrations
of the Obama analytics team. Daniel Wagner, the 29-year-old chief
analytics officer, said:
The notion of a campaign looking for groups such as “soccer
moms” or “waitress moms” to convert is outdated. Campaigns can
now pinpoint individual swing voters. White suburban women?
They're not all the same. The Latino community is very diverse
with very different interests. What the data permits you to do is to
figure out that diversity.
In productive tension with this escape from deadening classifications,
however, the movement to revalidate domain expertise within statis‐
tics seems about as old as formalized data mining.
In a now infamous Wall Street Journal article , Peggy Noonan mocked
the job ad for the Obama analytics department: “It read like politics
as done by Martians.” The campaign was simply insufficiently human,
with its war room both “high-tech and bloodless.” Unmentioned went
that the contemporaneous Romney ads read similarly.
Data science rests on algorithms but does not reduce to those algo‐
rithms. The use of those algorithms rests fundamentally on what so‐
ciologists of science call “tacit knowledge” —practical knowledge not
easily reducible to articulated rules, or perhaps impossible to reduce
to rules at all. Using algorithms well is fundamentally a very human
endeavor—something not particularly algorithmic.
No warning to young data padawans is as central as the many dangers
of overfitting, the taking of noise for signal in a given training set; or,
alternatively, learning too much from a training set to generalize
properly. Avoiding overfitting requires a reflective use of algorithms.
Algorithms are enabling tools requiring us to reflect more, not less.
In 1997 Peter Huber explained, “The problem, as I see it, is not one
of replacing human ingenuity by machine intelligence, but one of
assisting human ingenuity by all conceivable tools of computer sci‐
ence and artificial intelligence, in particular aiding with the improv‐
isation of search tools and with keeping track of the progress of an
analysis.”̄ The word 'improvisation' is just right in pointing to mastery
of tools, contextual reasoning, and the virtue of avoiding rote activity.
 
Search WWH ::




Custom Search