Database Reference
In-Depth Information
1.2.1
Total Information Awareness
In 2002, the Bush administration put forward a plan to mine all the data it could find, in-
cluding credit-card receipts, hotel records, travel data, and many other kinds of information
in order to track terrorist activity. This idea naturally caused great concern among privacy
advocates, and the project, called TIA, or Total Information Awareness , was eventually
killed by Congress, although it is unclear whether the project in fact exists under another
name. It is not the purpose of this topic to discuss the difficult issue of the privacy-security
tradeoff. However, the prospect of TIA or a system like it does raise technical questions
about its feasibility and the realism of its assumptions.
The concern raised by many is that if you look at so much data, and you try to find within
it activities that look like terrorist behavior, are you not going to find many innocent activit-
ies - or even illicit activities that are not terrorism - that will result in visits from the police
and maybe worse than just a visit? The answer is that it all depends on how narrowly you
define the activities that you look for. Statisticians have seen this problem in many guises
and have a theory, which we introduce in the next section.
1.2.2
Bonferroni's Principle
Suppose you have a certain amount of data, and you look for events of a certain type within
that data. You can expect events of this type to occur, even if the data is completely ran-
dom, and the number of occurrences of these events will grow as the size of the data grows.
These occurrences are “bogus,” in the sense that they have no cause other than that random
data will always have some number of unusual features that look significant but aren't. A
theorem of statistics, known as the Bonferroni correction gives a statistically sound way to
avoid most of these bogus positive responses to a search through the data. Without going
into the statistical details, we offer an informal version, Bonferroni's principle , that helps
us avoid treating random occurrences as if they were real. Calculate the expected number
of occurrences of the events you are looking for, on the assumption that data is random. If
this number is significantly larger than the number of real instances you hope to find, then
you must expect almost anything you find to be bogus, i.e., a statistical artifact rather than
evidence of what you are looking for. This observation is the informal statement of Bonfer-
roni's principle.
In a situation like searching for terrorists, where we expect that there are few terrorists
operating at any one time, Bonferroni's principle says that we may only detect terrorists by
looking for events that are so rare that they are unlikely to occur in random data. We shall
give an extended example in the next section.
Search WWH ::




Custom Search