Metadata Browsing (Digital Library)

Browsing is often described as the other side of the coin from searching, but really the two are at opposite ends of a spectrum. One dictionary defines browsing as "inspecting in a leisurely and casual way," whereas searching is "making a thorough examination in order to find something." Other dictionaries have more verbose definitions. According to Webster’s, browsing includes

• looking over casually (as a book), skimming;

• skimming through a book, reading at random passages that catch the eye;

• looking over books (as in a store or library), especially in order to decide what one wants to buy, borrow, or read;

• casually inspecting goods offered for sale, usually without prior or serious intention of buying;

• making an examination without real knowledge or purpose.

The word browse originally referred to animals nibbling on grass or shoots, and its use in relation to reading, which is now far more widespread, appeared much later. Early in the 20th century, the library community adopted the notion of browsing as "a patron’s random examination of library materials as arranged for use" when extolling the virtues of open book stacks over closed ones—the symbolic snapping of the links of the chained book mentioned in Section 1.2.

Searching is purposeful, whereas browsing tends to be casual. Terms such as random, informal, unsystematic, and without design are used to capture the unplanned nature of browsing and, often, the lack of a specific goal. Searching implies that you know what you’re looking for, whereas browsing implies that you’ll know it when you see it. Often, browsers are far less directed than that—perhaps just casually passing time. But the distinction between searching and browsing is not clear—pedants are quick to point out that if, when searching, you really know what you’re looking for, then there’s no need to find it. The truth is that we do not have a good vocabulary to describe various degrees of browsing.


The metadata provided with the documents in a collection can support different browsing activities. Information collections that are entirely devoid of metadata can be searched—that is one of the real strengths of full-text searching—but they cannot be browsed in any meaningful way unless some additional data is present. The structure that is implicit in metadata is the key to providing browsing facilities. And now it’s time for some examples.

Lists

The simplest structure is the ordered list. Figure 3.18a shows a plain alphabetical list of document titles. Notice incidentally that the alphabetizing follows the common practice of ignoring articles like The and A at the beginning of titles. For some lists the ordering may use another practice: names are conventionally alphabetized by surname, even though they may be preceded by first name and initials.

Long lists can take a long time to download and are cumbersome to scroll through. They are usually presented in alphabetic ranges, as in Figure 3.18b. The user clicks a tag and a corresponding list of titles appears. In this example, the ranges have been automatically chosen so that each covers a reasonable number of documents—titles in the ranges J-L, Q-R, and U-Yhave been merged because there are only a few under each of the letters. In fact, Figure 3.18a was generated in exactly the same way, but there are so few documents that there was just one overall range and so the alphabetic range selector was suppressed.

This scheme does not scale up well. Tabs with multiletter labels such as Fae-Fal are inconvenient. Although such labels are used in dictionaries and telephone directories, users generally take a stab at their desired location on the basis of the book’s bulk, and then they might employ the tabs. Going through a sequence of decisions (F, Fa-Far, Fae-Fal, … ) is a tedious and unnatural way of narrowing down the search.

Browsing an alphabetical list of titles: (a) plain list; (b) with A-Z tags

Figure 3.18: Browsing an alphabetical list of titles: (a) plain list; (b) with A-Z tags

The final tab, 0-9, presents another snag. Users can’t always know what characters titles start with— titles sometimes start with punctuation characters, arithmetic operators, or mathematical symbols. Fortunately, this is not a big problem in English because such documents rarely occur and can be dealt with by a single Miscellaneous tab.

Dates

In Figure 3.19, newspapers are browsed by date. An automatically generated selector gives a choice of years; items within each range are laid out by month. Figures 3.18 and 3.19 were created automatically based on Title and Date metadata respectively; the year ranges are chosen to put a reasonable number of items on each page.

Browsing by date

Figure 3.19: Browsing by date

Hierarchies

The browsers introduced so far are restricted to linear classifications with a limited number of documents. In contrast, hierarchical structures are used in areas that have very large numbers of items. In the library world, the Library of Congress and Dewey Decimal classifications are used to categorize printed books (enabling placement of volumes treating similar subjects on neighboring shelves). These schemes are considered hierarchical because the beginning parts of the code provide a rough categorization that is refined by the later characters.

Figure 3.20 shows a hierarchical display used in the Humanity Development Library. Nodes of the hierarchy are represented as bookshelves. Clicking one opens it up and displays all the nodes that lie beneath, as well as any documents at that level. For example, node 2.00 in Figure 3.20b contains one document and eight subsidiary nodes, of which one, node 2.06, is shown in Figure 3.20c. Just as bookshelf icons represent internal nodes of the hierarchy, so book icons represent documents, the leaves of the classification tree. Figure 3.20b shows a book icon for the Earth Summit Report, which is the only document with classification 2.00.

This hierarchical structure was generated automatically from metadata. Each document is accompanied by its associated position in the hierarchy. In fact, because documents can appear in several places, the metadata is multivalued. The hierarchical information includes names for the interior nodes, which are used to label the "bookshelves" in Figure 3.20. This particular classification scheme is nonstandard, chosen by the collection designer as being appropriate for the intended users. Some digital library systems impose uniformity; others provide flexibility for collection designers to organize things however they see fit. The latter option gives librarians freedom to exercise their professional judgment.

Facets

Given appropriate metadata, richer browsing options can be offered using the technique of faceted classification, which provides alternative navigation options and conveys information that helps users understand the content of the collections. Figure 3.21 illustrates facets in a search from the Australian Newspapers project. In Figure 3.21a the user has entered the query term Waikato (a region of New Zealand) and is viewing the result. There are over 1300 matching documents, the first ten being displayed in surrogate form, ranked, as usual, by relevance. The top item is a 1899 report in the Northern Territory Times that the SS Waikato was 42 days overdue on its trip from Vancouver to Auckland.

On the left are several categories—facets—that provide an alternative route for exploring and refining the results. The first, Title, is shown in its entirety, and the beginning of the second, Category, is also visible (scrolling down would reveal the rest). The labels under each facet show the number of documents tagged with that metadata value (e.g., 178 of the 1346 results came from the Argus newspaper), along with a visual representation as a bar graph. The user clicks on Argus to reveal the corresponding reduced set of matching documents in Figure 3.21b. The first matching item is a 1927 report about the Melbourne Cricket Club’s defeat of Waikato. Within this facet, the further facets displayed on the left can be explored by Category, Illustrated, Decade, and Word Count.

Browsing a classification hierarchy: (a) the beginning; (b) expanding Sustainable development;

Figure 3.20: Browsing a classification hierarchy: (a) the beginning; (b) expanding Sustainable development;

cont'd: (c) expanding Organizations, institutions

Figure 3.20, cont’d: (c) expanding Organizations, institutions

In general, facets use one of the forms of browsing described above. The Australian Newspapers project uses lists throughout, broken up into a series of pages that resemble search results. One could also imagine a Subject facet structured as a hierarchy and an On this day facet in calendar format. Facets are chosen by librarians based on the metadata available to guide users around collections. Not every piece of metadata makes a good facet. There are guidelines for effective facets; for example, facets should be mutually exclusive, permanent, and represent clear divisions of the items.

Next post:

Previous post: