Hypotheses, Questions, and Evidence - Writing for Computer Science

Information Technology Reference

In-Depth Information

An example is what might be described as “universal” indexing methods. In

such methods, the object to be indexed—whether an image, movie, audio file, or

text document—is manipulated in some way, for example by a particular kind of

hash function. After this manipulation, objects of different type can be compared:

thus, somehow, documents about swimming pools and images of swimming pools

would have the same representation. Such matching is clearly an extremely difficult

problem, if not entirely insoluble; for instance, how does the method know to focus

on the swimming pool rather than some other element of the image, such as children,

sunshine, or its role as a metaphor for middle-class aspirations? 3

In some work, the evidence or methods are internally inconsistent. For example,

in a paper on how to find documents on a particular topic, the authors reported that

the method correctly identified 20,000 matches in a large document collection. But

this is a deeply improbable outcome. The figure of 20,000 hints at imprecision—it is

too round a number. More significantly, verifying that all 20,000 were matches would

require many months of effort. No mention was made of the documents that weren't

matches, implying that the method was 100% accurate; but even the best document-

matching methods have high error rates. A later paper by the same authors gave

entirely different results for the same method, while claiming similar good results

for a new method, thus throwing doubt on the whole research program. And it is a

failure of logic to suppose that the fact that two documents match according to some

arbitrary algorithm implies that the match is useful to a user.

The logic underlying some papers is outright mystifying. To an author, it may

seem a major step to identify and solve a new problem, but such steps can go too

far. A paper on retrieval for a specific form of graph used a new query language

and matching technique, a new way of evaluating similarity, and data based on a

new technique for deriving the graphs from text and semantically (that word again!)

labelling the edges. Every element of this paper was a separate contribution whose

merit could be disputed. Presented in a brief paper, the work seemed worthless.

Inventing a problem, a solution to the problem, and a measure of the solution—all

without external justification—is a widespread form of bad science. 4

(Footnote 2 continued)

the 7 kilobytes that such a modem could transmit per second. Uncompressed, the bandwidth of a

modem was only sufficient for one byte per row per image, or, per image, about the space needed to

transmit a desktop icon. A further skeptical consideration in this case was that an audio signal was

also transmitted. Had the system been legitimate, the inventor must have developed new solutions

to the independent problems of image compression, motion encoding, and audio compression.

3 In another variant of this theme, objects of the same type were clustered together using some kind

of similarity metric. Then the patterns of clustering were analyzed, and objects that clustered in

similar ways were supposed to have similar subject matter. Although it is disguised by the use of

clustering, to be successful such an approach assumes an underlying universal matching method.

4 An interesting question is how to regard “Zipf's law”. This observation—“law” seems a poor

choice of terminology in this context—is if nothing else a curious case study. Zipf's topics may

be widely cited but they are not, I suspect, widely read. In Human Behaviour and the Principle of

Least Effort (Addison-Wesley, 1949), Zipf used languages and word frequencies as one of several

examples to illustrate his observation, but his motivation for the work is not quite what might

be expected. He states, for example, that his research “define[s] objectivelywhat wemean by the term

Writing for Computer Science

Search WWH ::

Custom Search

Home