Introducing the XML Forms Architecture (XFA) Part 1 (iText 5)


In this section, you’ll be introduced to the XML structure that’s used to define an XFA form, and we’ll try out different alternatives to fill out different types of XFA forms.

If you look at figure 8.10, it’s hard to see any difference between it and the forms you’ve created and used in previous examples. End users won’t notice this is a different kind of form.

With the next listing, you can inspect the PDF from the inside, and you’ll find out it’s an XFA form.

A static XFA form

Figure 8.10 A static XFA form

Listing 8.18 XfaMovie.java

Listing 8.18 XfaMovie.java Listing 8.18 XfaMovie.java

In this method, you use the method isXfaPresent() to find out if the form is an XFA form or an AcroForm. You also list the names of the fields. This is the content of the text file with the results:

tmp89-218_thumb

iText tells us that this is an XFA form , and it returns a list of fields with square brackets in their name ©. These square brackets are typical for XFA. When listing 8.18 generates a result like this, your PDF contains two different form descriptions: one using XFA technology and one using AcroForm technology. You can conclude that this is a static XFA form.

Static XFA forms

Let’s pretend you don’t know you’re working with a static XFA form, and fill in the form using the AcroFields class and the setField() method. This code will work correctly for most of the forms you’ll encounter.

Listing 8.19 XfaMovie.java

Listing 8.19 XfaMovie.java

iText will fill out the AcroForm (as it did in all previous examples), and it will make a fair attempt at filling out the XFA form simultaneously. In most cases, this works transparently: you don’t even notice that the two different technologies exist next to each other. However, if you look at figure 8.11, you’ll see that this is an example where iText fails. Although you provided values for the year, duration, and the IMDB ID, the corresponding fields remain empty.

Partially filled-in form

Figure 8.11 Partially filled-in form

I deliberately created the three fields in a way that isn’t supported by iText. This way I can explain the mechanism of XFA form filling with iText. There are different workarounds to deal with this problem, but let’s inspect the XFA form first.

XFA FORMS: INTERNAL STRUCTURE

When creating an AcroForm using iText, you implicitly create PdfDictionary, PdfAr-ray, and other PdfObject instances. XFA forms are totally different; they aren’t defined using PDF objects. XFA forms are described in an XML stream that’s embedded in the PDF file.

You can extract this XML into a separate file.

Listing 8.20 XfaMovie.java

Listing 8.20 XfaMovie.java

Note that the Document in this code snippet isn’t a com.itextpdf.text.Document object, but an instance of org.w3c.dom.Document. Transforming this Document into an XML file is done using different classes from the javax.xml.transform package. This is a shortened version of the resulting file.

Listing 8.21 movie_xfa.xml

Listing 8.21 movie_xfa.xml

If you want to understand the full XML file that was extracted, you’ll need to consult the XFA specification for more info about the elements that can be found inside an XFA form:

■ config, localeSet, xmp, … —The XFA XML can contain application-defined information and XFA grammar: configuration information, localization info, metadata, information about web connections, and so on. Except for the config tag, I’ve omitted these tags, because they’re outside the scope of this topic.

■ template—This is where the appearance and behavior of the form is defined.

■ datasets—This contains all the sets of data used with the form.

■ data—Contains the data held by fields in the form.

■ dataDescription—Defines the schema for the data.

As you can see, XFA separates data from the XFA template, which allows greater flexibility in the structure of the data supported and allows data to be packaged separately from the form.

FAQ Can I use iText to change the properties and appearance of an XFA form ? Yes and no. In the previous sections of this topic, you’ve used iText to manipulate the appearance of AcroForm fields, but none of these examples will change the XML definition inside the XFA stream. Changing an XFA form has to be done using XML tools. First extract the XFA XML from the PDF. Then add, remove, and update the tags and attributes between the <template> and </template> tags. Once this is done, use iText to replace the existing XFA stream with the updated one.

The template specification is described in about 300 pages in the XFA reference. It would lead us too far off topic to get into the details of manipulating an XFA form, but we’ll look at how to replace the full XFA XML in the next example, after changing the data.

THE DATA SPECIFICATION

The datasets section of the XFA form consists of a data and a dataDescription element. You can use any schema you want for the data. This is one of the major advantages of choosing the XFA approach instead of using AcroForms.

The dataDescription specification comprises 16 pages in the XFA reference. Here’s the introduction:

The XFA data description syntax is more concise and readable than XML Schema but does not do as much. XFA data descriptions do not include defaults and do not support validation of text content. They do, however, fully describe the namespaces, element names, attribute names, and the hierarchy which joins them.

XML Forms Architecture (XFA) Specification Version 3.1 Part 2 topic 21

Let’s take a look at the changes made by iText to the data element to find out why three fields weren’t filled out in figure 8.11. We’ll reuse listing 8.20 on the resulting PDF file to have a look at the data element that was filled by iText.

Listing 8.22 movie_filled.xml

Listing 8.22 movie_filled.xml

The data description in listing 8.21 expects the content of the fields "imdb", "duration", and "year" to be added as attributes of the movie tag. When you filled the form using listing 8.19, iText used a shortcut: it wrongly assumed that all data should be added between tags, not as attributes. There are three workarounds for this problem:

Change the form—Make sure the form doesn’t expect data added as attributes. This may not be an option, because you want the data inside the XFA form to be an identical match with the XML files you’re using in your business process. Use XML tools to fill out the data—This is the most elegant solution. We’ll discuss two possible ways to achieve this. In listing 8.23 we’ll replace the complete XFA XML; then, in section 8.6.2, we’ll let iText replace the data element in a programmer-friendly way.

■ Remove the XFA form, keep the AcroForm—This is your only option if you want to flatten the form. The resulting form will no longer contain XFA technology— the result will be a pure AcroForm.

The first option should be done with the tool that was used to create the form in the first place. Replacing the XML data can be done the hard way or the easy way; let’s look at the hard way first. REPLACING THE XFA STREAM

Suppose that you’ve updated the XFA XML manually and saved it in a file named xml. Now you want to take the XFA form src and replace the XFA stream with the new XFA form dest as a result.

Listing 8.24 xfa.xml

Listing 8.24 xfa.xml

The result is shown in figure 8.12. All the fields are now filled in correctly.

Correctly filled-out XFA form

Figure 8.12 Correctly filled-out XFA form

Listing 8.23 XfaMovie.java

Listing 8.23 XfaMovie.java

For this example, I’ve changed the XFA XML manually. I’ve replaced the XML snippet shown in listing 8.22 with this one.

The code in listing 8.23 is rather complex. We’ll find a better way to replace only the data XML in section 8.6.2. You can use this method, however, if you want to change other parts of the XFA form. For instance, if you want to change the appearance of the form.

Note that iText doesn’t parse what’s inside the template tag. One of the consequences is that you have to use an XML tool to apply changes to the form. Another consequence is that iText can’t flatten a pure XFA form; iText can’t translate the XFA syntax to draw a form field, captions, and lines into PDF syntax. A form can only be flattened with iText if it’s also defined using AcroForm technology. CHANGING AN XFA FORM INTO AN ACROFORM

If a form is defined twice, once using XFA technology once as an AcroForm, you can choose to remove the XFA technology with the removeXfa() method.

Listing 8.25 XfaMovie.java

Listing 8.25 XfaMovie.java

If you run listing 8.18 on the resulting form, you get the following output:

tmp89-228_thumb

After this operation, you can use all the iText functionality discussed in sections 8.2 to 8.5. That’s an advantage. The disadvantage is that you lose all the benefits you can have from XFA. This only works for static XFA forms with an AcroForm counterpart; it won’t work for dynamic XFA forms.

Dynamic XFA forms

One of the major advantages of XFA is you can define forms that can grow dynamically. In traditional PDF files, the layout of the content is fixed: the coordinate of every dot, every line, every glyph on the page is known in advance. PDF was created because there was a need for a document format that was predictable. When you create a document containing three pages, you don’t want it to be rendered as a document with two or four pages when opened on another OS or using a different viewer application. XFA makes an exception to this rule. A dynamic XFA form can grow dynamically depending on the data that’s entered.

XML DATA

Suppose that your movie data is stored as an XML file using this XML schema.

Listing 8.26 movies.xsd

Listing 8.26 movies.xsd

Here is a shortened example of an XML file that follows this schema. The full version contains 120 movies.

Listing 8.27 movies.xml

Listing 8.27 movies.xml Listing 8.27 movies.xml

If you want to create a form that can be filled with all the information in this XML file, regardless of the number of movies, the number of directors per movie, and the number of countries per movie, you need to use Adobe LiveCycle Designer.

Next post:

Previous post: