Information Technology Reference
In-Depth Information
organizations worldwide to open up digital representations of their holdings (unless
prohibited by copyright or other barriers) for public access as well.
What it means to make these materials “open” is a very critical question. Now,
what is starting to happen is that we are starting to see these traditional human readers
who flip virtual pages or study drawings on screen and are not very computationally
intensive getting joined by a menagerie of computer programs that want to do text
mining and data mining and image recognition and cross comparison and indexing
and all kinds of fascinating stuff, some of which is relatively well understood and
some of which is rapidly evolving, experimental, cutting edge research technology.
These software instruments are showing up and wanting access to large collections of
data and they are joining human readers in looking through large collections of text,
images, video and other material designed for more direct human apprehension. So all
of a sudden, we are starting to see these ideas about open access taking on this
problematic dimension of computational provisioning. As a content provider it's no
longer enough to say I need enough disk space to house the data I am obligated to
provide access to, and then I need some relatively light accompanying computing
capability to let human beings look at it. Now all of a sudden to really deliver
meaningful, comprehensive open access, you need to be able to provision a lot of
computational capacity, and in many cases come up with methods for rationing its
use, as the demand seems open-ended.
Many areas - certainly many scholarly disciplines or sub-disciplines - are just
getting too big and growing too fast for humans to cope; we now see measures of
disciplinary literature growth that are counted in articles per minute . Nobody is going
to keep up with the literature of a sub-discipline that's growing at a rate of an article
every 10 minutes, even. Or, I would invite you to consider the mismatch between
upload and view rates on something like YouTube; they upload something like a
hundred hours viewing time worth of video every minute of every day, or something
in that area. These are numbers that basically say that without a lot of computational
capacity, you really cannot cope effectively with existing knowledge bases or content
collections, you cannot analyze developments or allocate your limited reading time
well.
Consider the problem of the historian of the 21st century: say they're trying to
write a history of one of our recent presidents in the United States. The issue there is
not if they can get access to the material; the issue there is that there is more material
there than they can read in five lifetimes and somehow they need a lot of
computational help to classify it and to identify the relevant parts. Imagine the special
collections of the 21st century that are being accessioned in libraries right now: in the
past, a library or archive might get a few dozen cartons of somebody's papers and
correspondence. Today that personal collection might involve thirty-five years of
collected email, hundreds of thousands of messages, and a pile of disk drives full of
documents. The issue there again is going to be there is no way human beings are
going to dig through all of that; in the future it is going to be historians and
biographers and political scientists and other scholars partnered with computational
tools that need to interact with these special collections.
Search WWH ::




Custom Search