Database Reference
In-Depth Information
to the intrusion on the lives of half a million innocent people, the work involved is suffi-
ciently great that this approach to finding evil-doers is probably not feasible.
1.2.4
Exercises for Section 1.2
EXERCISE 1.2.1 Using the information from Section 1.2.3 , what would be the number of
suspected pairs if the following changes were made to the data (and all other numbers re-
mained as they were in that section)?
(a) The number of days of observation was raised to 2000.
(b) The number of people observed was raised to 2 billion (and there were therefore
200,000 hotels).
(c) We only reported a pair as suspect if they were at the same hotel at the same time on
three different days.
! EXERCISE 1.2.2 Suppose we have information about the supermarket purchases of 100
million people. Each person goes to the supermarket 100 times in a year and buys 10 of the
1000 items that the supermarket sells. We believe that a pair of terrorists will buy exactly
the same set of 10 items (perhaps the ingredients for a bomb?) at some time during the year.
If we search for pairs of people who have bought the same set of items, would we expect
that any such people found were truly terrorists? 3
1.3 Things Useful to Know
In this section, we offer brief introductions to subjects that you may or may not have seen
in your study of other courses. Each will be useful in the study of data mining. They in-
clude:
(1) The TF.IDF measure of word importance.
(2) Hash functions and their use.
(3) Secondary storage (disk) and its effect on running time of algorithms.
(4) The base e of natural logarithms and identities involving that constant.
(5) Power laws.
Search WWH ::




Custom Search