Cloud Architectures and Cultural Memory - Challenges of Information Management Beyond the Cloud

Information Technology Reference

In-Depth Information

I would just note parenthetically that there are also tough problems on the

curatorial side of many archives and special collections here; one of the big ones is

redaction. When the size of the print record was pretty small, you could find enough

human beings to go through and redact indiscreet things and pull items that should

stay private; in government settings you could make decisions about whether

something was classified and whether it could be declassified. Now you have just an

unmanageable problem when you start talking about things like government records

or personal papers. Can you let users compute on them and then select a few things

that a human curator might go through and appraise and redact if necessary? Can you

let that computation happen safely without too much implicit information leaking out

to cause trouble? These are strange and wonderful new areas of research that are

taking the stage as we struggle with this environment.

Finally, I want to really stress that the need for computational capability is not just

to permit humans to do the kind of access or research that they have traditionally done

in an environment where the amount of content available has grown unmanageable. It

is also central to being able to ask very new kinds of questions, whether about graphs

of social, intellectual, economic or other connectivity; about the outcomes of

inference or the identification of consensus or contradictions within very large

collections of text or data; about statistical correlations and the identification of

outliers. One could, for example, attempt to run an analysis of major collections of

Greek antiquities worldwide to computationally attempt to characterize an

“archetypal” version of a common kind of vase, and to characterize the patterns of

variation, and then link this to the geography of excavation sites.

So let us return to this issue of access implying computation and connect it to

clouds. We are now moving into an environment where more and more kinds of

access actually require meaningful amounts of computation and we have some hard

questions here. One is where does the computation happen; the way we answer this

largely determines who chooses and provides the tools, and who sets the limits on

what you as a user of information can do. Let us look very quickly at a few scenarios.

Let us imagine that I want to do a complex computation over a substantial slice of

the recent literature in molecular biology - perhaps, say, 750,000 articles. Or a big

slice of the Twitter archive. If the publishers or other repositories housing them will

let me download them (not at all clear, in part because some publishers do not agree

that this kind of download and compute scenario is part of open access, or something

that they need to support in offering public access, on a policy basis; in part because

you may not have the storage needed to hold all these articles, or the bandwidth to

your local resources that will allow you to download them in reasonable time and at

reasonable cost; in part because some repositories may not have the computational

provisioning even to support this kind of bulk downloading, or may rate limit it to

reduce the computational impact, though this translates into a very long download

time); given all these caveats, I could download them and do my computations

locally.

Or, in theory I could send my computation over to the repository. Well, how many

sites do you know who say “send me your arbitrary programs, I'm happy to run them

and see what they do on my site, what fun”? No, what they will do is they will

Challenges of Information Management Beyond the Cloud

Search WWH ::

Custom Search

Home