Databases Reference
In-Depth Information
1.2.4 Exercises for Section 1.2
Exercise 1.2.1 : Using the information from Section 1.2.3, what would be the
number of suspected pairs if the following changes were made to the data (and
all other numbers remained as they were in that section)?
(a) The number of days of observation was raised to 2000.
(b) The number of people observed was raised to 2 billion (and there were
therefore 200,000 hotels).
(c) We only reported a pair as suspect if they were at the same hotel at the
same time on three different days.
! Exercise 1.2.2 : Suppose we have information about the supermarket pur-
chases of 100 million people. Each person goes to the supermarket 100 times
in a year and buys 10 of the 1000 items that the supermarket sells. We believe
that a pair of terrorists will buy exactly the same set of 10 items (perhaps the
ingredients for a bomb?) at some time during the year. If we search for pairs of
people who have bought the same set of items, would we expect that any such
people found were truly terrorists? 3
1.3
Things Useful to Know
In this section, we offer brief introductions to subjects that you may or may
not have seen in your study of other courses. Each will be useful in the study
of data mining. They include:
1. The TF.IDF measure of word importance.
2. Hash functions and their use.
3. Secondary storage (disk) and its effect on running time of algorithms.
4. The base e of natural logarithms and identities involving that constant.
5. Power laws.
1.3.1
Importance of Words in Documents
In several applications of data mining, we shall be faced with the problem of
categorizing documents (sequences of words) by their topic. Typically, topics
are identified by finding the special words that characterize documents about
that topic. For instance, articles about baseball would tend to have many
occurrences of words like “ball,” “bat,” “pitch,”, “run,” and so on. Once we
3 That is, assume our hypothesis that terrorists will surely buy a set of 10 items in common
at some time during the year. We don't want to address the matter of whether or not terrorists
would necessarily do so.
Search WWH ::




Custom Search