Putting It All Together (Digital Library)

This section tours the Massachusetts Institute of Technology (MIT) institutional repository—a representative, publicly available digital library system—to illustrate the basic shape and form of many digital libraries today. The focus here is on the end-user’s perspective. In Section 7.6 we revisit this example to explore it from a digital librarian’s point of view.

An institutional repository

Figure 3.22a shows the home page of MIT’s institutional repository, which is implemented in the DSpace system (see Notes and sources). It is a gateway to 30,000 digital items: technical reports, working papers, preprints, and theses.

Facets in the Australian Newspapers project: (a) searching for the term Waikato across all newspapers; (b) exploring using the Title facet

Figure 3.21: Facets in the Australian Newspapers project: (a) searching for the term Waikato across all newspapers; (b) exploring using the Title facet


 DSpace installation at MIT: (a) home page; (b) query result for Stallman;

Figure 3.22: DSpace installation at MIT: (a) home page; (b) query result for Stallman;

cont'd: (c) document overview; (d) the document itself;

Figure 3.22, cont’d: (c) document overview; (d) the document itself;

cont'd: (e) searching by author for Stallman; (f) community and collections hierarchy;

Figure 3.22, cont’d: (e) searching by author for Stallman; (f) community and collections hierarchy;

cont'd: (g) browsing by author; (h) pre-filtering authors by ann;

Figure 3.22, cont’d: (g) browsing by author; (h) pre-filtering authors by ann;

cont'd: (i) articles by Annan, Kofi

Figure 3.22, cont’d: (i) articles by Annan, Kofi

The repository is organized as a hierarchy: communities are the uppermost level, and they are hierarchically grouped into subcommunities and so forth, with individual collections at the bottom. Searching can be conducted at any level.

The left-hand margin of the screen provides access to all levels of the community hierarchy, plus browsing by date, author, and subject. Below this is where users register and log in. Searching and browsing are unrestricted, but an account is needed for services like requesting e-mail alerts. Registration is open to all, but MIT affiliates are distinguished from others and are granted access to restricted items as well as the ability to submit new items to the repository. The bottom two links in the margin connect users to RSS feeds, which provide information about new items that are added. Two variants are supported: RSS 1.0 and RSS 2.0 (depending on who you ask, the letters stand for really simple syndication, rich site summary or RDF site summary). Both are XML based (see Section 4.3 for more information on XML and Section 6.4 for a description of RDF).

A user interested in the work of Richard Stallman, founder of the GNU Project and the Free Software Foundation, enters the query stallman in the main search box. Figure 3.22b shows the result: 97 matches. The first hit is a 1981 manual for the open source EMACS text editor developed by Stall-man, and clicking this link brings up Figure 3.22c, a document surrogate displaying basic metadata: title, author, abstract, and issue date. The collection—AI Memos—is displayed at the top, along with the associated community—Computer Science and Artificial Intelligence Lab, Artificial Intelligence Lab Publications.

The user clicks a link to view the document. Once it has downloaded (26 MB) it appears in a new window, Figure 3.22d. Despite the fact that it was born digital—as the manual itself points out, it can be viewed online as part of the Emacs help system or purchased in print for $3.25—this item in the collection is derived from a scanned copy. This is not surprising: the document was authored a year before the inception of PostScript (described in Section 4.5).

The fact that the document is digitized accounts for its vast download size and the evident graininess of the pages. It has not undergone OCR and can only be located through metadata, not full text search. However, the result set does include later documents for which a searchable version had been generated directly from the computer file, and these are included because their full text also contains stallman. In fact, all matching full-text items turn out to be other authors referencing Stallman’s work. An "advanced" facility in the digital library can be used to restrict the search to the author’s name, and the more carefully constructed query in Figure 3.22e returns only eight matching documents.

Next the user clicks the Communities and Collections link on the left to examine the complete repository, listed alphabetically by top-level community and indented to convey the hierarchical structure. The ordering continues recursively within communities, so you can drill down and commence queries at any level. The list is very long, so our user elects to browse by author instead, clicking the appropriate link to bring up Figure 3.22g. Here the user can jump immediately to any letter of the alphabet using the navigation bar; then he can step through the authors using Next page and Previous page links. Twenty authors are displayed per page; this can be changed to 5, 10, 40, 60, 80, or 100 using a pull-down menu. The sort order can also be changed from ascending to descending. Pressing Update reloads the page with the new setting.

As noted earlier, alphabetic browsing does not scale well. MIT’s repository has 25,000 authors and most letters are tedious to click through even 100 items at a time. However, users can enter the first few letters on the browsing page to filter the information displayed. To get to Figure 3.22h our user typed ann, and then selected the author Annan, Kofi to yield Figure 3.22i, which shows a list of all the articles by the former UN Secretary-General. In this case there is just one: a 1972 Master’s thesis at the Sloan School of Management entitled International joint venture with a government partner case study: copper mining in Zambia. The document is viewed just as before, but only registered MIT users can print it because it belongs to the Thesis collection community. For other users, a link is provided to purchase a printable copy.

Next post:

Previous post: