Experimentation - Writing for Computer Science

Information Technology Reference

In-Depth Information

use of standardized resources means that there is direct control of the principal vari-

ables, and experiments are comparable between research groups; existing published

results provide a baseline against which new results can be directly compared.

By the standards of computer science, the TREC experiment is expensive, with, for

example, some months of assessor time required every year. However, TREC illus-

trates that robust experiments can have high impact. When TREC began (in 1992, a

year or two before the Web began to be significant), there was a large range of com-

peting theories about the best way to match documents to queries. Weak methods

were rapidly culled by TREC, and a great many dramatic improvements in informa-

tion retrieval were spurred by the opportunity that TREC created. The Web search

engines drew substantial inspiration from the TREC work and, in contrast to some

other areas of computer science, the links between academia and industry remain

strong. This impact could not have been achievedwithout the large-scale involvement

of human assessors, or without the commitment to robust experimentation.

Coding for Experimentation

In computer science research, in principle at least, the sole reason for coding is to

build tools and probes for generating, observing, or measuring phenomena. Thus the

choice of what to measure guides the process of coding and implementation—or,

perhaps, indicates what does not have to be coded.

The basic rule is to keep things simple. If efficiency is not being measured, for

example, don't waste time squeezing cycles from code. If a database join algorithm

is being measured, it may not necessary to implement indexes, and it is almost

certainly unnecessary to write an SQL interpreter. All too often, computer scien-

tists get distracted from the main task of producing research tools, and instead, for

example, develop complete systems.

In coding for an experiment, there are several other such rules or guidelines that

might seem obvious, but which are often not followed. Examples include:

One task, one tool: decompose the problem into separate pieces of code. In most

cases, trying to create a single piece of code that does everything is just not pro-

ductive. Do you need to integrate the data classifier into the network generator, and

the network generator into the visualizer? Wouldn't it have been easier to develop

them independently and combine them with a script?

Be aware that you may need to trade ease of implementation against realism of the

result. Can load balancing across distributed machines on a network be examined

without development of significant software infrastructure? Can an algorithm be

assessed if all data is held in memory, or is it necessary, for realism, to manage data

on disk, perhaps in a custom-built file system?Hard-coding of data structures, input

formats, and so on, may allow for rapid implementation; does it lead to unrealistic

behaviour or simplifications?

Cut the right corners. Coding for a day to save an hour's manual work is a waste

of time, even if coding is the more principled approach. But coding for an hour

Search WWH ::

Custom Search

Home