Adding Chunk, Phrase, Paragraph, and List objects Part 2 (iText 5)

Distributing text over different lines

In the movie_paragraphs_1.pdf document (listing 2.8), all the information about a movie is in one Paragraph. For most of the movies, the content of this Paragraph doesn’t fit on one line, and iText splits the string, distributing the content over different lines. The default behavior of iText is to put as many complete words to a line as possible. iText splits sentences when a space or a hyphen is encountered, but you can change this behavior by redefining the split character.

THE SPLIT CHARACTER

If you want to keep two words separated by a space character on the same line, you shouldn’t use the normal space character, (char)32; you should use the nonbreaking space character (char)16 0.

Next you’ll create a StringBuffer containing all the movies by Stanley Kubrick, and you’ll concatenate them into one long String, separated with pipe symbols (|). In the movie titles, you’ll replace the ordinary space character with a nonbreaking space character.

Listing 2.10 MovieChain.java

Listing 2.10 MovieChain.javaListing 2.10 MovieChain.java


Because you’ve replaced the space characters, iText can’t find any of the default split characters in chunk1. The text will be split into different lines, cutting words in two just before the first character that no longer fits on the line. Then you add the same content a second time, but you define the pipe symbol (|) as a split character.

Next is a possible implementation of the SplitCharacter interface. You can add an instance of this custom-made class to a Chunk with the method setSplitCharacter().

Listing 2.11 PipeSplitCharacter.java

Listing 2.11 PipeSplitCharacter.java

The method that needs to be implemented looks complicated, but in most cases it’s sufficient to copy the method shown in the previous listing and change the return line. If you’re working with Asian glyphs, you may also add these ranges of Unicode characters:

tmp17C-53_thumb[3]

The result is shown in the upper part of figure 2.6.

In Paragraph A, the content is split at unusual places. The word "Love" is split into "Lo" and "ve," and the final "s" in the word "Paths" is orphaned. For the Chunks in Paragraph B, a split character was defined: the pipe character (|). Paragraph C shows what the content looks like if you don’t replace the normal spaces with nonbreaking spaces.

Splitting paragraphs

Figure 2.6 Splitting paragraphs

HYPHENATION

This listing is similar to listing 2.10, except it doesn’t replace the ordinary space characters. Another Chunk attribute is introduced: hyphenation.

Listing 2.12 MovieChain.java

Listing 2.12 MovieChain.java tmp17C56_thumb[2]

In this listing, you create a HyphenationAuto object using four parameters. iText uses hyphenation rules found in XML files named en_US.xml, en_GB.xml, and so on. The first two parameters refer to these filenames. The third and fourth parameters specify how many characters may be orphaned at the start or at the end of a word. For instance, you wouldn’t want to split the word elephant like this: e-lephant. It doesn’t look right if a single letter gets cut off from the rest of the word.

FAQ I use setHyphenation(), but my text isn’t hyphenated. Where do I find the XML file I need? If you try the example in listing 2.12, and not one word is hyphenated, you’ve probably forgotten to add the itext-hyph-xml.jar to your classpath. In this JAR, you’ll find files such as es.xml, fr.xml, de_DR.xml, and so on. These XML files weren’t written by iText developers; they were created for Apache’s Formatting Objects Processor (FOP). The XML files bundled in itext-hyph-xml.jar are a limited set, and your code won’t work if you’re using a language for which no XML file was provided in this JAR. In that case, you’ll have to find the appropriate file on the internet and add it to a JAR in your classpath. Don’t forget to read the license before you start using a hyphenation file; some of those files can’t be used for free.

The hyphenated text is added twice: once with the default space/character ratio, and once with a custom space/character ratio.

THE SPACE/CHARACTER RATIO

The Paragraph objects D and E from listing 2.12, have a justified alignment. This alignment is achieved by adding extra space between the words and between the characters. In Paragraph D, you see the default spacing. The ratio is 2.5, meaning that iText has been adding 2.5 times more space between the words than between the characters to match the exact length of each line.

You can change this ratio with the PdfWriter.setSpaceCharRatio() method. This is done for Paragraph E. On the lower-right side of figure 2.6, you can see that no extra space is added between the characters, only between the words, because the ratio was changed to NO_SPACE_CHAR_RATIO (which is in reality a very high float value).

The List object: a sequence of Paragraphs called Listltem

In the previous examples, you’ve listed movies, directors, and countries. In the next example you’ll repeat this exercise, but instead of presenting the data as an alphabetically sorted series of movie titles, you’ll create a list of countries, along with the number of movies in the database that were produced in that country. You’ll list those movies, and for every movie you’ll list its director(s).

ORDERED AND UNORDERED LISTS

To achieve this, you’ll use the List object and a number of ListItem objects. As you can see in the UML diagram (figure 2.1), ListItem extends Paragraph. The main difference is that every ListItem has an extra Chunk variable that acts as a list symbol.

A first version of this report was created using ordered and unordered lists. The list symbol for ordered lists can be numbers—which is the default—or letters. The letters can be lowercase or uppercase—uppercase is the default. The default list symbol for unordered lists is a hyphen.

Listing 2.13 MovieLists1.java

Listing 2.13 MovieLists1.java

Note that it’s not always necessary to create a ListItem instance. You can also add String items directly to a List; a ListItem will be created internally for you.

CHANGING THE LIST SYMBOL

Next is a variation on the same theme.

Listing 2.14 MovieLists2.java

Listing 2.14 MovieLists2.javaListing 2.14 MovieLists2.java

For the list with countries, you now define an indentation of half an inch for the list symbol. You also define a different list symbol for every item, namely the database ID of the country. The difference for the movie list is subtler: you tell iText that it shouldn’t realign the list items. In listing 2.13, iText looks at all the items in the List and uses the maximum indentation for all the items. By adding the line mov-ielist.setAlignindent(false) in listing 2.14, every list item now has its own list indentation based on the space taken by the list symbol. That is, unless you’ve added the line list.setAutoindent(false), in which case the indentation specified with setSymbolIndent() is used.

As you can see in figure 2.7, a period (.) symbol is added to each list symbol for ordered lists. You can override this behavior with the methods setPreSymbol() and setPostSymbol(). In listing 2.14, the pre- and postsymbols are defined in such a way that you get "Director 1:", "Director 2:", and so on, as list symbols (shown at the top-right in figure 2.7).

SPECIAL TYPES OF LISTS

Four more variations are shown in figure 2.7. First, in listing 2.15, you’ll create List objects of type RomanList, GreekList, and ZapfDingbatsNumberList. In listing 2.16, you’ll create a ZapfDingbatsList.

Listing 2.15 MovieLists3.java

Listing 2.15 MovieLists3.javaList and Listltem variations

Figure 2.7 List and Listltem variations

Be careful not to use ZapfDingbatsNumberList for long lists. This list variation comes in four different types defined with a parameter in the constructor that can be 0, 1, 2, or 3, corresponding to specific types of numbered bullets. Note that the output will only be correct for items 1 to 10, because there are no bullets for numbers 11 and higher in the font that is used to draw the bullets.

ZapfDingbats is one of the 14 standard Type 1 fonts. It contains a number of special symbols, such as a hand with the index finger pointing to the right: (char)42. This symbol is used in listing 2.16 for the director list. The special list class for this type of list is called ZapfDingbatsList. This is the superclass of ZapfDingbatsNumberList.

Listing 2.16 also shows how to change the first index of an ordered list using set-First(), and how to set a custom list symbol for the entire list with setListSymbol().

Listing 2.16 MovieLists4.java

Listing 2.16 MovieLists4.java

We’ll conclude this section with a number of objects that aren’t shown on the class diagram in figure 2.1: vertical position marks and separator Chunks.

Next post:

Previous post: