How To Build A Digital Library

Putting It All Together (Digital Library)

This section tours the Massachusetts Institute of Technology (MIT) institutional repository—a representative, publicly available digital library system—to illustrate the basic shape and form of many digital libraries today. The focus here is on the end-user’s perspective. In Section 7.6 we revisit this example to explore it from a digital librarian’s point of view. An institutional […]

Textual documents: The raw material (Digital Library)

Documents are the digital library’s building blocks. It is time to step down from our high-level discussion of digital libraries—what they are, how they are organized, and what they look like—to nitty-gritty details of how to represent the documents they contain. To do a thorough job for international documents in non-Roman alphabets, we will have […]

Representing Textual Documents (Digital Library)

Way back in 1963, at the dawn of interactive computing, the American National Standards Institute (ANSI) began work on a character set that would standardize text representation across a range of computing equipment and printers. (At the time, a variety of codes were in use by different computer manufacturers, such as an extension of a […]

Textual Images (Digital Library) Part 1

Plain text documents in digital libraries are often produced by digitizing paper documents. Digitization is the process of taking traditional library materials, typically in the form of books and papers, and converting them to electronic form, which can be stored and manipulated by a computer. Digitizing a large collection is a time-consuming and expensive process […]

Textual Images (Digital Library) Part 2

Checking and saving The next stage of OCR is manual checking. The output is displayed on the screen, with problems highlighted in color. One color may be reserved for unrecognized and uncertainly recognized characters, another for words that do not appear in the dictionary. Different display options can suppress some of this information. The original […]

Web Documents: HTML and XML (Digital Library)

HTML, the Hypertext Markup Language, is the underlying document format of the World Wide Web, which makes it an important baseline for interactive viewing. Like all major, long-standing document formats, HTML has undergone growing pains, and its history reflects the anarchy that has characterized the Web’s evolution. Since HTML’s conception in 1989, its development has […]

Presenting Web Documents: CSS and XSL (Digital Library) Part 1

Two kinds of style sheet can be used to control the presentation of marked-up documents. Cascading style sheets (CSS) produce presentable documents with minimal effort. They were developed principally in support of HTML, but also work with XML. A parallel development is the extensible stylesheet language (XSL) for XML (and for versions of HTML that […]

Presenting Web Documents: CSS and XSL (Digital Library) Part 2

Context- and media-dependent formatting There’s more to cascading style sheets. Using compound selectors, rules can detect when descendant or sibling tags match a particular pattern and produce different effects. Rules can trigger when attributes match particular patterns, and this facility can be combined with compound selectors. Figure 4.12 introduces some contrived formatting instructions into the […]

Page Description Languages: postScript and PDF (Digital Library) Part 1

The purpose of page description languages is to express typeset documents in a way that is independent of the particular output device used. Early word-processing programs and drawing packages incorporated code for sending documents to particular printers and could not be used with other devices. With the advent of page description languages, programs can generate […]

Page Description Languages: postScript and PDF (Digital Library) Part 2

Compatibility with Unicode Character-identifier keyed, or CID-keyed, fonts provide a newer format designed for use with Unicode. They map multiple byte values to character codes in much the same way that the encoding vector works in base fonts—except that the mapping is not restricted to 256 entries. The CID-keyed font specification is independent of PostScript […]