Cloud Architectures and Cultural Memory - Challenges of Information Management Beyond the Cloud

Information Technology Reference

In-Depth Information

sandbox you into some kind of virtual machine environment that is very carefully

constrained and isolated - but this is hard to do, and a big security risk, and they do

not have the computational capacity anyway, so running your “programs” on the

repository site is unlikely in many cases. Or they will support a small, constrained set

of high-level queries where they can bound the computational demand and the

functionality of the queries.

(I will note in passing another issue. Literatures are, and will be, most typically

scattered across large numbers of repositories or libraries. So from the user

perspective, it is not working with one source of content, but many in parallel. This

also changes the tradeoffs and indeed even the feasibility of shipping queries to the

data rather than copying the aggregated data somewhere and computing on it.)

There is a debate that is simmering which comes down to this: are open-access

articles going to be liquid, mobile, or are we going to, for most practical purposes,

have the tools of text mining defined by the publishing community because you will

need to run those tools in their environment? Will publishers only let you run specific

tools or will they charge you extra if you want to run other tools that are

computationally intensive; they may choose to let you only run things that are fairly

inexpensive. You can ask the same question not just about publishers (and other

repositories) with regard to articles but also more generally about cultural memory

organizations and the materials that they house. The Library of Congress got some

publicity about a year and a half ago, when it was announced that they were going to

preserve and host the Twitter archive. So they have now got these data feeds coming

in from Twitter. Housing them on disk is not only a moderate problem; if you talk to

the people there the really intractable problem is how they provide meaningful access

to this resource because of the scale of computational provisioning necessary. It is a

question of where does the Library of Congress get computational capacity to deal

with the kind of queries that people are going to want to run across this database,

which are not simply “show me tweet 21,000,000,992”; they are going to be asking

questions about the nature of the social graph, retweet patterns, and things that are

genuinely expensive to compute.

So back to the clouds. Can we do some of this computation in the clouds, where at

least in some cases there is already a public market in computational resources and an

infrastructure (albeit a heavy-handed one) for isolating users? Can we imagine an

environment where if you really want to dig into the treasures of a poorly funded

cultural heritage organization or maybe not even poorly funded, but just one without

infinite resources, the deal is you buy some computing cycles in a part of the same

cloud that the cultural heritage storage organization occupies so that the data transfer

is manageable within that cloud, (you certainly in most cases if it is a big collection of

data, cannot download it yourself because the consumer broadband infrastructure is

incapable of handling this, or prices it out of reach).

There are some other interesting variations one can imagine here. In some parts of

the United States now (and I think other countries as well), we are asking questions

about the role of libraries and particularly public libraries in society going forward.

We are also asking questions about how much bandwidth should a public library have

and what would they do with it. There are at least a few experiments that are starting

Challenges of Information Management Beyond the Cloud

Search WWH ::

Custom Search

Home