Working with Digital Collections

What is the point of your digital collection? Or, more precisely, why would anyone want to access the content? The beauty of libraries is that, no matter how you answer such questions, it is impossible to predict who will find your collection useful, or when, or how. The variety of possible interactions leads to the question, What exactly will users want to do with your digital collection?

The take-home message is to understand what services you intend your digital library to provide to its users. To do this, you need to understand how users will interact with the content, and balance your desire to support them against the technical and organizational demands of providing ever-richer services.

Using information from digital libraries

A simple answer is that users want to read the documents—or, in the case of multimedia items, view, listen to, and interact with them. A more nuanced answer might take into account what users are trying to achieve. Do they want to save the digital object to their own personal workspace? Do they want to share it with others? Do they want to cut out a portion and use it in a new document of their own? Do they want to provide a link from their Web site to an item in the digital library?

Although it is possible to extract portions of paper documents, doing so often risks damaging the original—and incurring the wrath of librarians. You can quote text from books and papers, but quoting from multimedia objects like video or audio is much more difficult—especially if they are in analogue form. However, it is relatively easy to extract portions of digital items and re-use them. Furthermore, digital copying leaves the original content completely unaffected.


The copy and paste metaphor is familiar to anyone who has used a word processor or image editor. The same principle applies to audio and video, although the programs usually offer more controls. The ease of copying encourages users to extract things from a digital library and to repurpose them in whatever ways they see fit. Once digital content is placed online, it can immediately be copied, legally or otherwise, and used again. And copied again, and so on. Furthermore, it can never be taken back—as AOL discovered when they released their search logs.

Extracting content from a digital library can be made more difficult by using display technologies that attempt to restrict undesirable, unofficial, or illegal usage. (However, the downside is that material becomes less accessible to the intended users.) Some display formats have features that restrict their use. For example, some documents can be secured so that they can be viewed on screen but not printed (e.g., the PDF format described in Section 4.5). As a simple first step, a library should at least make the copyright status of its documents clear to end-users—although, as noted in Section 1.5, the legal considerations may differ from country to country.

Users of a digital library should be clear about any restrictions on how they can use the content. Examples include:

• Can an end-user display video content in a public venue?

• Can audio content be sampled or remixed into new musical works?

• Can search engines index the textual content?

Some digital libraries offer users an opportunity to combine (or "remix") photographs, graphics, film clips, music, and text to create new multimedia displays. For example, a library might make available multimedia objects pertaining to some event of local or national significance and invite patrons to use a Web-based video tool to create their own expression of what the event means to them. The newly created artifact could then be entered into the digital library.

Referring to objects in a digital library

Another aspect to the "use" of a digital object is the extent to which it can be included in the Web’s link structure. Having found something of interest, a library user might bookmark it or link to its URL. But will the URL change if other items are added to the library—that is, will it be persistent? Valid hyperlinks are the glue that holds the Web together, and references are much less effective if they break easily. The same question can also be asked about searches (Will the list of results for a particular search term in a digital library be persistent?) and about parts of the browsing structure provided by the library (How persistent is the list of documents whose titles begin with the letter A?).

When evaluating digital library software, you should consider what happens to the URLs of objects in the library when:

• new items are added to a collection

• existing items are deleted

• the digital library’s Web server is reorganized

• the collection is moved to another computer

• the collection migrates to a different software system.

As their name indicates, URLs (Uniform Resource Locators) are locators that specify how to find the file containing the information. Fortunately, it is possible to put a "redirect" on the Web server that automatically redirects users to a different location, that is, a different URL, when the original URL changes. Because they are locators, URLs do not usually survive radical events such as server reorganizations.

Many schemes have been devised to make identifiers of digital objects persist over time despite organizational changes in the underlying software systems, Web servers, and their location on the Web. We discuss these schemes in topic 7.

Berry-picking

When people interact with information-retrieval systems, they typically encounter not just one, but several, items of interest that they wish to pursue further. Many systems provide mechanisms for users to maintain a cache of interesting items:

• bookmarks or favorites in Web browsers

• shopping carts in e-commerce stores

• marked or tagged lists of records in library catalogs.

Here we call such lists of interesting items berry baskets—a term that evokes the idea of picking the ripest and juiciest fruit from a bush—and are essential for the effective use of large collections. Figure 2.7 shows an example of this, taken from the Library of Congress’s vast online catalog. In the figure, a user is browsing the listed works of Noam Chomsky (there are over 190). Check boxes are provided down the left-hand side to select items of interest and our user, interested in the topic of propaganda, has selected relevant works. Once satisfied with the selection (perhaps visiting subsequent pages) she may save it, e-mail it to herself or a colleague, or print it as a paper record of the library visit. Commercial systems typically provide the option to purchase the items on the list.

Berrybasket support provided by the Library of Congress's online catalog

Figure 2.7: Berrybasket support provided by the Library of Congress’s online catalog

Berrybaskets raise the same questions about identity that were addressed in Section 2.2. Are users— and therefore their baskets—anonymous, or do they need to provide authentication? As knowledge work becomes increasingly collaborative, people find it useful to share their lists, either publicly or with specific groups of friends or colleagues.

Next post:

Previous post: