Presenting Web Documents: CSS and XSL (Digital Library) Part 2

Context- and media-dependent formatting

There’s more to cascading style sheets. Using compound selectors, rules can detect when descendant or sibling tags match a particular pattern and produce different effects. Rules can trigger when attributes match particular patterns, and this facility can be combined with compound selectors. Figure 4.12 introduces some contrived formatting instructions into the running example to illustrate these points. Again it incorporates the formatting instructions of Figure 4.10a using the @import command.

 (a) Cascading style sheet illustrating context-sensitive formatting; (b) viewing the result in an XML-enabled Web browser

Figure 4.12: (a) Cascading style sheet illustrating context-sensitive formatting; (b) viewing the result in an XML-enabled Web browser

The first rule uses the pseudo-element before to tailor the content of a Photo according to the value of the desc attribute—but only when the Photo is a child of Agency. Omitting the > symbol would change the meaning to "descendant" rather than "child" and would trigger if the Photo node appeared anywhere beneath an Agency node.


The second rule suppresses the Photo text if its desc attribute matches the string "A photo." If the first rule appeared without the second, the text "A photo" would be shown for both the FAO and the World Bank records because the document’s DTD supplies this text as the default value for desc.

The third rule demonstrates the + syntax that is used to specify sibling context. When one Agency node follows another at the same level in the document tree, this rule alters its background and foreground colors. In the XML document of Figure 4.7, only the first Agency record in the document retains its default coloring. The rule also illustrates two different ways of specifying color: by name (red) and by specifying red, green, and blue components in hexadecimal.

The next rule prints the full name of the FAO in the same color as the background, because its hq attribute in the Name tag matches "Rome, Italy." (The example would be more realistic if different colors were used here, but in this topic we are restricted to black and white.) It uses a third form of color specification: rgb(), which gives color components in decimal—and these in fact specify the same color as in the previous hexadecimal assignment. This rule makes no sense in practice, but it explains why "(FAO)" is placed far to the right in Figure 4.12b, because it is preceded by the now-invisible name.

The last rule further illustrates inheritance by setting the font size for Agency to 20 points. This overrides the 16-point value set in the initial style sheet and is inherited by descendant nodes.

A key feature of cascading style sheets is the ability to handle different media, such as screen, print, handheld devices, computer projectors, text-only display, Braille, and audio. Figure 4.13 shows the idea. The @media command names the media type and gives rules, scoped to that media, within braces. The example first sets the Agency node globally to be a block with specified margins. Then for screen and projection media the font is set to 16-point Helvetica, while for print it is 12-point Times.

An @import command can be augmented to restrict the media type it applies to, for example:

tmp14-71_thumb[2]

 

 

Using CSS to specify different formatting styles for different media

Figure 4.13: Using CSS to specify different formatting styles for different media

CSS continues to be developed. Like HTML and XHTML, the trend is to modularize the specification to make it easier for a software implementer to clarify what support is given. As with several Web technologies, the actual implementation of CSS in browsers lags behind the formal publication of the standard. Different browser versions support different subsets of the specification (often with idiosyncratic bugs). Web designers spend considerable time checking the appearance of their site in different browsers on different operating system platforms. For a digital library you should test your document display on common browsers and be aware of the browser population used to access your content. Typically, you will be able to find out about your users’ browsers via your library’s usage data records (see Section 2.2).

Extensible stylesheet language

XSL, the extensible stylesheet language for XML, transcends CSS by allowing the stylesheet designer to transform documents radically. Parts can be duplicated, tables of contents can be created automatically, lists can be sorted. A price is paid for this expressive power—complexity.

The XSL specification is divided into three parts: formatting objects (FO), XSL transformations (XSLT), and XPath (a way of selecting parts of a document). Formatting objects map closely to CSS instructions and use the same property names wherever possible. XSL transformations manipulate the document tree, while XPath selects parts to transform. We expand on these later in this section.

CSS can be combined with facilities such as Web-page scripting and the document object model mentioned in Section 4.3 (under "Parsing XML") to provide comparable functionality to XSL— this combination is sometimes dubbed dynamic HTML. Experts fiercely debate which is the better approach. We think you should know about both, since the wider context in which you work often dictates the path you must tread. There is one key difference, however, between the two approaches. Because XSL is designed to work with XML, it cannot be used with all forms of HTML—because not all forms are XML compliant. CSS operates in either setting: HTML, for which it was designed, and XML, because there is no restriction in the tag names that CSS can provide rules for.

We introduce formatting in XSL by working through the examples used to illustrate CSS. However, XSL greatly extends CSS’s functionality. For example, it includes a model for pagination and layout that extends the page-by-page structure of paper documents to provide an equivalent to the "frames" used in Web pages. Its terminology is internationalized. For example, padding-left becomes padding-start to make more sense when dealing with languages that are written right to left. Similar terms are used to control the space above, below, and at the end of text, although the old names are recognized for backward compatibility.

Figure 4.14 shows an XSL file for the initial version of the United Nations example. The XML syntax is far more verbose than its CSS counterpart in Figure 4.10a. Beyond the initial NGODoc declaration, it includes many of the keywords we saw in the earlier version. For example, CSS’s font-size: 25pt specification for the Title node now appears between nested tags whose inner and outer elements include the attributes font-size="25pt" and match="Title", respectively.

 XSL style sheet for the basic United Nations agencies example

Figure 4.14: XSL style sheet for the basic United Nations agencies example

The style sheet is included in Figure 4.7 by adding the line just after the XML header declaration (the text of Figure 4.14 resides in a file called un_basic.xsl), which is exactly what we did before with CSS.

tmp14-74_thumb[2]

The result is a replica of Figure 4.10b, although both standards are complex and it is not uncommon to encounter small discrepancies.

Figure 4.14 begins with the obligatory XML processing application statement, followed by an <xsl:stylesheet> tag. As usual this is a top-level root element that encloses all the other tags. Its attributes declare two namespaces: one for XSL itself, called xsl; the other for Formatting Objects, called fo. Namespaces are an XML extension that keep sets of elements designed for particular purposes separate—otherwise confusion would occur if both XSL and FO happened to include a tag with the same name (such as block). Once the namespaces have been set up in Figure 4.14, <xsl:block> specifies the XSL block tag while <fo:block> specifies the Formatting Objects tag.

Namespaces also incorporate semantic information. If an application encounters a namespace declaration whose value is http://www.w3c.org/1999/XSL/Format, it interprets subsequent tag names according to a published specification. In the following discussion we focus on a subset of Formatting Object tags typically used in document-related XML style sheets. The full specification is more comprehensive.

Returning to Figure 4.14, the next tag sets the document’s output type. XSL style sheets perform transformations: they are used to transform an XML source document into another document. Because our style sheet is designed to format the document using Formatting Object tags, the output is set to xml. Other choices are html—in which case all the fo: scoped tags in the XSL file would need to be replaced with HTML tags—and text.

To perform the transformation, the input document is matched against the style sheet and a new document tree built from the result. First the document’s root node is compared with the XSL file’s <xsl:template> nodes until one is found whose match attribute corresponds to the node’s name. Then the body of the XSL template element is used to construct the tags in the output tree. If apply-templates is encountered, matching continues recursively on that document node’s children (or as we shall see later, on some other selected part of the document), and further child nodes in the output tree are built as a result.

In the example, the document’s root node matches <xsl:template match="NGODoc">. This adds several fo tags to the output tree—tags that initialize the page layout of the final document. Eventually <xsl:apply-templates> is encountered, which causes the document’s children <Head> and <Body> to be processed by the XSL file. When the matching operation has run its course, the document tree that it generates is rendered for viewing.

The fourth template rule specifies its match attribute as Head | Body to catch Head or Body nodes. Although it achieves the same effect as commas in CSS, this new syntax is part of a more powerful and general standard called XPath. The last template rule also introduces brackets around the abbreviation. The

tmp14-75_thumb[2]

is again an XPath specification. The "." is a way of selecting the current position, or "here"—in this context it selects the text of the current node (Abbrev). This usage is adapted from the use of a period (.) in a file name to specify the current directory.

Using Formatting Objects

Formatting Objects provide similar capabilities to CSS: margins, borders, padding, foreground and background color, blocks, in-line text, tables with rows and cells, and so on. Many CSS declarations are simply mapped into fo tag names and attributes with the same name.

Figure 4.15 shows an XSL style sheet for the version of the United Nations Agencies example illustrated in Figure 4.11, with records embedded in a table and the title formatted with a bullet point. Like the CSS version, the file inherits from the basic XSL style sheet. This is done using the <xsl:import> tag, whose href attribute supplies the appropriate URL.

The first template rule processes the <Title> node, which starts by wrapping a list-block and list-item around the core information. Using a Unicode character that lies beyond the normal ASCII range, it then inserts a list-item-label whose content is a bullet point, before setting up the list-item-body with the content of the Title tag.

Next, instead of using <apply-templates> to recursively process any nested elements as was done in the first XSL example, this rule specifies <apply-imports>. This searches prior imported files (in the order that they were imported) for a rule that also matches the current tag (Title) and applies that rule as well. The result is to nest settings given in the Title rule of un_basic.xsl inside the current formatting, and then fire the <apply-templates> statement specified in the rule. The overall effect provides an inheritance facility similar to that of CSS.

The remaining template rules have to elements for table, table row, and table cell that correspond to the same entities in CSS and are bound to the same element names in the source document. Attributes within these tags provide similar table formatting: a silver-colored table with white cells using a mixture of border styles and padding.

Some complications stem from the stricter requirements of the Formatting Objects specification. First, tables must include a body, whereas the equivalent structure in CSS is optional. In the example, the body element appears in the rule for Body, so this rule encodes both table and table-body elements. This is not possible in the CSS example because these two table structures are set by the display property, and would therefore conflict in the file. To avoid the conflict the source document would need two tag names: one mapping to table and the other to table-body.

A second complication is that fo:blocks cannot be placed immediately within fo:table-body and fo:table-row elements. That is why the two rules containing these elements must resort to <xsl:apply-templates> in their recursive processing of the document (instead of <apply-imports>) and duplicate the formatting attributes already present in the imported file.

Context- and media-dependent formatting

Figure 4.16 reworks the Figure 4.12 version of the United Nations example and illustrates context-based matching using contrived formatting instructions.

The key to context-based matching in XSL is the XPath mechanism. In many operating system interfaces, multiple files can be selected using wild card characters—for example, project/*/Sle.html selects all files of this name within any subdirectory of project. XPath generalizes this to select individual sections of a document. This is done by mapping nodes in the document tree into a string that defines their position in the hierarchy. These strings are expressed just as file names are in a directory hierarchy, with node names separated by slashes. For example, in our document NGODoc/Body/* returns all the Agency nodes.

This idea is augmented to condition access on attributes stored at nodes. For example, Name[@desc] matches a Name node only if it has a desc attribute defined. Built-in predicates are supplied to check the position of a node in the tree—for example, whether it is the first or last in a chain of siblings.

 XSL style sheet illustrating tables and lists

Figure 4.15: XSL style sheet illustrating tables and lists

XSL style sheet illustrating context-sensitive formatting

Figure 4.16: XSL style sheet illustrating context-sensitive formatting

The first template rule in Figure 4.16 inserts the text that is stored as a Photo node’s desc attribute into the document, prefixed by "Available." The second is more selective and only matches if the Photo node’s desc attribute contains the text "A photo"—which happens to coincide with its default given in the DTD. If it matches, no text is displayed, and recursive template matching down that part of the tree is abandoned.

The third rule, which works in conjunction with the fourth, demonstrates XSL modes. When an Agency node is first encountered, rule 3 fires and sets up the basic formatting for the block. When it comes to recursively applying the template match, it selects itself with select=".", switches the mode to Extra Color, and then rematches on Agency. This time only rule 4 can match (because of the mode), which enforces the additional requirement that Agency must be at least the second node in the file. If so, the rule uses <xsl:attribute> tags to augment the closest enclosing tag (the main block for Agency) with attributes for foreground and background colors.

Finally, the remaining rule sets the foreground color the same as the background color for any Name node whose hq attribute matches "Rome, Italy."

XSL style sheet that sorts United Nations agencies alphabetically

Figure 4.17: XSL style sheet that sorts United Nations agencies alphabetically

XSL supports different output media—screen, printer, and so on—using the media attribute of <xsl:output>, which we have already seen used to set the output type to XML. For example, to restrict a style sheet setting to printers, add

tmp14-79_thumb[2]

Processing in XSL

Our examples have shown XSL’s ability to transform the source document, but the changes have been slight (such as putting brackets around the content of an Abbrev tag) and could all have been achieved using CSS. Figure 4.17 shows a style sheet that sorts the United Nations agencies alphabetically for display, something CSS can’t do. It imports un_basic.xsl to provide some general formatting and then defines a rule for Body that performs the sorting, overriding the match that would have occurred against Head | Body in the imported file.

First a block is created that maintains the same margins and spacing provided by the basic style file. Then a recursive match is initiated on all Agency nodes that are descendants of the Body node. In earlier examples matching has been expressed by combining the opening and closing tags, as in <xsl:apply-templates/>. This shorthand notation is convenient for straightforward matches. Here we separate the opening and closing parts, and supply the sort criterion through the nested tag xsl:sort. To accomplish the desired result, the example sets the data type to string and specifies a sort on child nodes of Agency called Name.

This example only scratches the surface. XSL can perform a vast array of transformations—even for sorting there are many more attributes that control the ordering. Other language constructs include variables, if statements, and for statements. XSL contains many elements of programming languages, making it impressively versatile, and it is finding use in places that even its designers did not envision.

Next post:

Previous post: