FIrst published in 2004 in DevX
Although using XSLT to process XML is increasingly common, most developers still use it only occasionally—and often treat it as just another procedural language. But that's not the best way to use XSLT. Learn how to simplify and improve your XSLT processing using event-driven and declarative techniques.
XML appears in some form in most modern applications—and often needs to be transformed from one form into another: merged, split, massaged, or simply reformatted into HTML. In most cases, it's far more robust and efficient to use XSLT to perform such transformations than to use common programming languages such as Java, VB.NET, or C#. But because XSLT is an add-on rather than a core language, most developers use XSLT only occasionally, and have neither time nor resources to dive into the peculiarities of XSLT development or to explore the paradigms of functional and flow-driven programming that efficient use of XSLT requires.
Such occasional use carries the danger of abusing programming techniques suitable for mainstream languages such as Java, C and Python, but that can lead to disastrous results when applied to XSLT.
However, you can avoid the problems of occasional use by studying a few applications of different well-known programming problems to an XSLT programming task through this set of simple, thoroughly explained exercises.
An XSLT processor takes an XML document as input, processes it, and outputs the content in (usually) some altered form, such as XML, HTML, or text. Here's a simple XML document that serves as the basis for the input examples in this article:
<?xml version="1.0"?> <bookstore> <book isbn="1-56592-235-2" lang="en"> <author>David Flannagan</author> <title>JavaScript: The Definitive Guide</title> </book> <book isbn="1-56592-235-1" lang="en"> <author>David Flannagan</author> <title>JavaScript: The Definitive Guide</title> </book> <book isbn="0-471-40399-7" lang="en"> <author>Dan Margulis</author> <title>Photoshop 6 for Professionals</title> </book> </bookstore>
The document describes several books in a bookstore, providing the ISBN number, a language code, author, and title for each book.
Flow-driven XSLT
Suppose you needed to extract all the book titles in the following form:
<?xml version="1.0" ?> <titles> <title>JavaScript: The Definitive Guide</title> <title>JavaScript: The Definitive Guide</title> <title>Photoshop 6 for Professionals</title> </titles>
A flow-driven XSLT stylesheet example might look like this:
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/bookstore"> ❶ <titles> <xsl:for-each select="book"> ❷ <xsl:call-template name="process-book"/> </xsl:for-each> </titles> </xsl:template> <xsl:template name="process-book"> <xsl:copy-of select="title"/> </xsl:template> </xsl:stylesheet>
The stylesheet matches the root node right away at ❶ and then enforces the control flow by pointing to each <book> element in ❷ using the combination of the for-each construct and the call-template function.
The example above is somewhat incomplete as it does not give exactly the same output as the one defined in the problem definition. Indeed, once you launch it, the result is one long line resembling this:
<?xml version="1.0"?> <titles><title>JavaScript: The Definitive Guide</title><title>JavaScript: The Definitive Guide</title><title>Photoshop 6 for Professionals</title></titles>
To format it nicely, you have to add one more statement to the XSLT stylesheet:
<xsl:output indent="yes" encoding="utf-8"/>
The indent="yes" activates the indentation. It is also wise to specify an output encoding explicitly, even though UTF-8 is the default encoding for XSLT.
Now, suppose you make the input file a bit more complex, introducing sections and rows to locate books more easily in the bookstore:
<?xml version="1.0"?> <bookstore> <section num="1"> <row num="1"> <book isbn="1-56592-235-2" lang="en"> <author>David Flannagan</author> <title>JavaScript: The Definitive Guide</title> </book> <book isbn="1-56592-235-1" lang="en"> <author>David Flannagan</author> <title>JavaScript: The Definitive Guide</title> </book> <book isbn="0-471-40399-7" lang="en"> <author>Dan Margulis</author> <title>Photoshop 6 for Professionals</title> </book> </row> </section> </bookstore>
If you try to continue in the flow-driven way, the XSLT must grow considerably (and as you'll see, needlessly) to adapt to the format change, adding templates to iterate over and process the <section> and <row> elements:
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output indent="yes" encoding="utf-8"/> <xsl:template match="/bookstore"> <titles> <xsl:for-each select="section"> <xsl:call-template name="process-section"/> </xsl:for-each> </titles> </xsl:template> <xsl:template name="process-section"> <xsl:for-each select="row"> <xsl:call-template name="process-row"/> </xsl:for-each> </xsl:template> <xsl:template name="process-row"> <xsl:for-each select="book"> <xsl:call-template name="process-book"/> </xsl:for-each> </xsl:template> <xsl:template name="process-book"> <xsl:copy-of select="title"/> </xsl:template> </xsl:stylesheet>
Event-driven XSLT
Fortunately, you can make the transformation much simpler by using matched templates. A matched template is one the XSLT processor triggers when its "match" attribute matches the current (context) node, whether that's simply the name of a tag or a more complex XPath expression. For example, the processor will trigger the following template whenever the context node is a "lang" attribute (the ampersand denotes an attribute node rather than an element node).
<xsl:template match="*[@lang]"> <xsl:text>This element has the follwing language id:<xsl:value-of select="@lang"></xsl:text> </xsl:template>
By processing the file through matched templates, the code makes as few assumptions as possible about the format of the input file. For example, the following stylesheet outputs exactly the same result for both input files, even though their hierarchical formats differ significantly. Here's the revised stylesheet:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output indent="yes" encoding="utf-8"/> <xsl:template match="/"> ❶ <xsl:element name="titles"> ❷ <xsl:apply-templates select="node()"/> ❸ </xsl:element> </xsl:template> <xsl:template match="book"> <xsl:copy-of select="title"/> </xsl:template> </xsl:stylesheet>
This event-driven version matches the root element in ❶ regardless of its name by using the single backslash (/) syntax. Next, it outputs the root <titles> tag in ❷ and instructs the stylesheet to continue the iteration over the contents of the current or context node (the root node in this case) with the apply-templates call in ❸.
If you apply this stylesheet to the second input file, you'll get the following result:
<titles> <title>JavaScript: The Definitive Guide</title> <title>JavaScript: The Definitive Guide</title> <title>Photoshop 6 for Professionals</title> </titles>
The output is indeed the same as for the first input file, except for one minor annoyance. There are some gratuitous carriage returns before and after the <title> tags that cause the extra white space in the output.
After trying to determine the cause of these extra carriage returns, an occasional XSLT programmer might just drop the simple event-driven approach altogether in favor of the more complex flow-driven one. But if you instead explore the XSLT specification, you'll find a built-in template that copies text through and thus outputs the carriage returns:
<xsl:template match="text()"> <xsl:value-of select="."/> </xsl:template>
In the example above, the carriage returns stem from the inside of the <section>, <row>, and <book> tags of the input document, one for each tag.
To correct that, you can add one line to the event-driven stylesheet that matches text() nodes as follows:
<xsl:template match="text()"/>
That line gets rid of the carriage returns by overriding the built-in text template using a custom version that produces no output.
The key point to take away here is that almost any useful XSLT stylesheet should override at least two of the built-in templates: the one for text, shown above, and the one that matches all nodes, which is:
<xsl:template match="*|/"> <xsl:apply-templates/> </xsl:template>
The built-in template for nodes copies nothing to the output, but by invoking the
You can gain fine-grained control over extra whitespace characters in the XSLT output by using the <xsl:preserve-space> and <xsl:strip-space> constructs in the stylesheet, or by using the xml:space attribute on XML tags in the input files.
Imperative XSLT
Unlike most programming languages, XSLT does not favor sequential execution. This is manifested by the verbosity of the related language constructs such as switch and for-each, and by weak support of side-effects (no variables in the traditional sense).
This common example illustrates the verbosity of the imperative approach, which constructs an HTML table, placing the book names in rows and alternating colors on odd and even rows from the input document:
<?xml version="1.0" encoding="utf-8"?> <table> <tr> <td style="color:red;">David Flannagan</td> </tr> <tr> <td style="color:blue;">David Flannagan</td> </tr> <tr> <td style="color:red;">Dan Margulis</td> </tr> </table>
Here's how you can accomplish the task in the imperative style:
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output indent="yes" encoding="utf-8"/> <xsl:strip-space elements="*"/> <xsl:template match="row"> ❶ <table> <xsl:for-each select="book[1]"> ❷ <xsl:call-template name="process-book">❸ <xsl:with-param name="even" select="false()"/> </xsl:call-template> </xsl:for-each> </table> </xsl:template> <xsl:template name="process-book"> ❹ <xsl:param name="even"/> <xsl:choose> <xsl:when test="$even"> <tr><td style="color:blue;" ><xsl:value-of select="author"/></td></tr> </xsl:when> <xsl:otherwise> <tr><td style="color:red;" ><xsl:value-of select="author"/></td></tr> </xsl:otherwise> </xsl:choose> <xsl:for-each select="following-sibling::book[1]"> ❺ <xsl:call-template name="process-book"> <xsl:with-param name="even" select="not($even)"/> </xsl:call-template> </xsl:for-each> </xsl:template> <xsl:template match="text()"/> </xsl:stylesheet>
The stylesheet creates one table for each <row> element in the input, so it first matches the row tag in ❶. Then, it uses the for-each construct ❷ to change the execution context to the first book node, calling the process-book template ❸ for each with a parameter that controls the row color in the HTML table. The process-book template ❹ then outputs the row, with either a red or a blue color depending on the value of the parameter , and calls itself ❺ to process the next book element with the opposite parameter value.
As you can see, this processing method gets complex very quickly, and you'd need to alter it for every format alteration in the input XML file.
Declarative XSLT
For XSLT, declarative is the opposite of the common imperative or algorithmic strategy; that is, an XSLT programmer does not define a sequence of actions that form an algorithm but rather sets a number of rules that the result should satisfy.
The declarative nature of the language lets you place templates anywhere and in any order in the XSLT document, because order has no impact on the resulting document. This rule applies except in cases of conflict resolution where order is the last decision criteria.
Here is a stylesheet written with the declarative approach that provides the same output:
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output indent="yes" encoding="utf-8"/> <xsl:strip-space elements="*"/> <xsl:template match="/"> <table> <xsl:apply-templates select="@*|node()"/> </table> </xsl:template> <xsl:template match="book[(position() mod 2)=0]"> <tr><td style="color:red;"><xsl:value-of select="author"/></td></tr> ❶ </xsl:template> <xsl:template match="book[(position() mod 2)=1]"> <tr><td style="color:blue;" ><xsl:value-of select="author"/></td></tr> ❷ </xsl:template> <xsl:template match="text()"/> </xsl:stylesheet>
In contrast to the procedural approach, this version doesn't define any algorithm. Instead, it specifies two templates for the processor to match: one for even-numbered rows ❶ and one for odd-numbered rows ❷. The processor outputs the contents in red for even-numbered elements and in blue for odd-numbered elements.
Key indexing in XSLT
You can simplify a fair portion of XSLT processing if you understand how to use keys. Keys in XSLT have more or less the same meaning that indexes have in relational databases, except that in XSLT, keys index hierarchical structure rather than relational structure. It's easiest to explain keys with an example.
Imagine that you need to count the number of book copies available for each book title and display them in an HTML table, where each row looks like this:
...... JavaScript: The Definitive Guide 2
Here is a possible solution that illustrates the use of the keys:
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output indent="yes" encoding="utf-8"/> <xsl:key name="kbook" match="book" use="title"/> ❶ <xsl:template match="/"> <table> <xsl:apply-templates select="node()|@*"/> </table> </xsl:template> <xsl:template match="book"> <tr> <td> <xsl:value-of select="title"/> </td><td> <xsl:value-of select="count(key('kbook',title))"/> ❷ </td> </tr> </xsl:template> <xsl:template match="text()"/> </xsl:stylesheet>In the preceding example, the key declaration ❶ has three parts: the name of the key, used to refer to it later in the code, the match, that is, the element or attribute of the input data to be indexed, and the use which is an XPath expression that defines the key itself. XPath is a language for addressing parts of an XML document, designed to be used by XSLT and XPointer. See the full language specification for more information. In this particular case, the expression <xsl:key name="kbook" match="book" use="title"/> literally means: Create a key with the name kbook on all the tags book and group them by title. The "book" template uses the key by calling the function key() ❷ with two parameters: the name of the key and the value of the index as defined in the @use attribute of the key declaration—in this case, simply "title" as that's the child of the context <book> node. Quite expectedly, this stylesheet would produce two identical lines for the book "JavaScript: The Definitive Guide" as shown below.
<?xml version="1.0" encoding="utf-8"?> <table> <tr> <td>JavaScript: The Definitive Guide</td> <td>2</td> </tr> <tr> <td>JavaScript: The Definitive Guide</td> <td>2</td> </tr> <tr> <td>Photoshop 6 for Professionals</td> <td>1</td> </tr> </table>
That leads to another common XSLT problem: removing duplicates.
Removing Duplicates: the Muenchian Method
Because XSLT is an almost side-effect-free declarative language, the problem of removing duplicates—ridiculously simple in imperative languages such as C++ or Java—becomes overly complicated. But fortunately, an elegant solution exists, so unexpected that it even earned its own name, "Muenchian," because Steve Muench was reportedly the first to discover it.
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output indent="yes" encoding="utf-8"/> <xsl:key name="kbook" match="book" use="title"/> ❶ <xsl:template match="/"> <table> <xsl:apply-templates select="node()|@*"/> </table> </xsl:template> <xsl:template match="book"> <xsl:if test="generate-id()=generate-id(key('kbook',title)[1])"> ❷ <tr> <td> <xsl:value-of select="title"/> </td><td> <xsl:value-of select="count(key('kbook',title))"/> </td> </tr> </xsl:if> </xsl:template> <xsl:template match="text()"/> </xsl:stylesheet>Notice that the key declaration in this example ❶ is identical to the previous example. You use the generate-id() function ❷ to obtain a unique id for each node, which ensures that every time you pass in the same
Using Complex Keys in XSLT
Because the use attribute of the key definition is an XPath expression, it's possible to create quite elaborate indexes that rely upon complex XPath statements. As an example, generate-id() makes a unique key for every <book> node.
<xsl:key name="kbook" match="book" use="generate-id()"/>
The International Standard Book Number, or ISBN (sometimes pronounced "is-ben"), is a unique identifier for books, intended to be used commercially. The following declaration calculates the checksum of an ISBN number by returning true if the checksum passes the test and false otherwise.
You can find the check digit of an ISBN by first multiplying each digit of the ISBN by that digit's place in the number sequence, with the leftmost digit being multiplied by 1, the next digit by 2, and so on. Next, take the sum of these multiplications and calculate the sum modulo 11, with "10" represented by the character "X". As an example, for the ISBN 1-56592-235-2, the calculation would be (1*1 + 2*5 + 3*6 + 4*5 + 5*9 + 6*2 + 7*2 + 8*3 + 9*5) mod 11. The translate function deletes all the dash (-) characters from the @isbn attribute value. The following example uses the substring function to extract each character from the string returned by translate.
<xsl:key name="kbook" match="book" use="boolean((substring(translate(@isbn, '-',''), 1,1) * 1 + ❶ substring(translate(@isbn, '-',''), 2,1) * 2 + ❷ ❸ substring(translate(@isbn, '-',''), 3,1) * 3 + substring(translate(@isbn, '-',''), 4,1) * 4 + substring(translate(@isbn, '-',''), 5,1) * 5 + substring(translate(@isbn, '-',''), 6,1) * 6 + substring(translate(@isbn, '-',''), 7,1) * 7 + substring(translate(@isbn, '-',''), 8,1) * 8 + substring(translate(@isbn, '-',''), 9,1) * 9) mod 11 - substring(translate(@isbn, '-',''),10,1)) or (boolean((substring(translate(@isbn, '-',''), 1,1) * 1 + substring(translate(@isbn, '-',''), 2,1) * 2 + substring(translate(@isbn, '-',''), 3,1) * 3 + substring(translate(@isbn, '-',''), 4,1) * 4 + substring(translate(@isbn, '-',''), 5,1) * 5 + substring(translate(@isbn, '-',''), 6,1) * 6 + substring(translate(@isbn, '-',''), 7,1) * 7 + substring(translate(@isbn, '-',''), 8,1) * 8 + substring(translate(@isbn, '-',''), 9,1) * 9) mod 11 = 10) and (substring(translate(@isbn, '-',''),10,1)) = 'X')"/>
The check digit of an ISBN can be found by first multiplying each digit of the ISBN by that digit's place in the number sequence, with the leftmost digit being multiplied by 1, the next digit by 2, and so on ❶. Next, take the sum of these multiplications and calculate the sum modulo 11, with "10" represented by the character "X". As an example, for the ISBN 1-56592-235-2, the calculation would be as following: (1*1 + 2*5 + 3*6 + 4*5 + 5*9 + 6*2 + 7*2 + 8*3 + 9*5) mod 11. The substring function ❷ is used to cut each character out of the string returned by translate. The translate function ❸ deletes all stroke characters from the @isbn attribute.
Branching vs. Modes in XSLT
XSLT's branching powers are weak compared to the branching statements of conventional languages. Instead, you can use the powerful mechanism of modes—often unexplored by occasional XSLT programmers.
Suppose you have to print all the titles and their respective ISBN codes, checking for the ISBN code validity at the same time. You could represent the desired result as follows:
<?xml version="1.0" encoding="utf-8"?> <table> <th>ISBN number check failed</th> <tr> <td class="color:red;">1-56592-235-1</td> </tr> </table><table> <th>ISBN number check passed</th> <tr> <td>1-56592-235-2</td> </tr> <tr> <td>0-471-40399-7</td> </tr> </table>
Without knowing how to use keys and modes, you might implement the solution with the following stylesheet logic:
<xsl:choose> <xsl:when test=""> </xsl:when> <xsl:otherwise> </xsl:otherwise> </xsl:choose>
This example would construct the table by iterating on the ISBN nodes and choosing whether to output class="color:red;" on each pass. This would be easy if you weren't obliged to group the result and output all the failed ISBN codes first. For the purpose of grouping, the use of keys and modes leads to much simpler code, the alternatives being extension functions or chaining of two different XSLT stylesheets.
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output indent="yes" encoding="utf-8"/> <xsl:key name="kbook" match="book" use="boolean((substring(translate(@isbn, '-',''), 1,1) * 1 + substring(translate(@isbn, '-',''), 2,1) * 2 + substring(translate(@isbn, '-',''), 3,1) * 3 + substring(translate(@isbn, '-',''), 4,1) * 4 + substring(translate(@isbn, '-',''), 5,1) * 5 + substring(translate(@isbn, '-',''), 6,1) * 6 + substring(translate(@isbn, '-',''), 7,1) * 7 + substring(translate(@isbn, '-',''), 8,1) * 8 + substring(translate(@isbn, '-',''), 9,1) * 9) mod 11 - substring(translate(@isbn, '-',''),10,1)) or (boolean((substring(translate(@isbn, '-',''), 1,1) * 1 + substring(translate(@isbn, '-',''), 2,1) * 2 + substring(translate(@isbn, '-',''), 3,1) * 3 + substring(translate(@isbn, '-',''), 4,1) * 4 + substring(translate(@isbn, '-',''), 5,1) * 5 + substring(translate(@isbn, '-',''), 6,1) * 6 + substring(translate(@isbn, '-',''), 7,1) * 7 + substring(translate(@isbn, '-',''), 8,1) * 8 + substring(translate(@isbn, '-',''), 9,1) * 9) mod 11 = 10) and (substring(translate(@isbn, '-',''),10,1)) = 'X')"/> <xsl:template match="/"> <table> <th>ISBN number check failed</th> <xsl:apply-templates select="key('kbook',true())" mode="failed"/> ❶ </table> <table> <th>ISBN number check passed</th> <xsl:apply-templates select="key('kbook',false())" mode="passed"/> ❷ </table> </xsl:template> <xsl:template match="book" mode="failed"> ❸ <tr> <td class="color:red;"> ❹ <xsl:value-of select="@isbn"/> </td> </tr> </xsl:template> <xsl:template match="book" mode="passed"> ❺ <tr> <td> <xsl:value-of select="@isbn"/> </td> </tr> </xsl:template> <xsl:template match="text()"/> </xsl:stylesheet>
This version processes both types of book nodes—those that did not pass checksum verification for their ISBN codes, and those that did—using separate templates for the failed and passed modes. The stylesheet outputs ISBN values in red for books with ISBN codes that fail the checksum test.
Line ❶ processes all book nodes that did not pass checksum verification for their ISBN codes. Line ❷ processes all book nodes that passed checksum verification for their ISBN codes. Line ❸ is triggered in the failed mode. Line ❹ marks the book with the incorrect ISBN in red. Line ❺ executes for books that passed the test.Extending XSLT
Sometimes XSLT turns to be too lexically poor to do complex transformations. Two viable options then exist:
- Chain the execution of XSLTs instead of trying to do everything in one pass.
- Use common extension functions from the EXSLT package.
Chaining XSLT Execution
Contrary to what one might think, chaining XSLT stylesheets—using the output of one stylesheet transformation as the input for the next stylesheet in the chain—does not add much overhead if done in a proper way. Although nearly all XSLT processors reconstruct the structure of the input document in memory for each pass, that process is not equivalent to the reconstruction of a DOM tree. Most XSLT processors use an internal format that may be a lot faster. In fact, a number of small XSLT stylesheets chained together can actually boost performance as compared to a single complex stylesheet. The author provides a Java library that can chain stylesheet execution with any JAXP-compliant XSLT processor.
Using Common Extension Functions
There is an effort to provide a more or less common set of extensions to XSLT with the corresponding reference implementations. Some of these functions already exist in various XSLT processors under different names.
The most notable function is node-set. It allows the conversion of result tree fragments into node-sets. If you create a variable with a select statement, it returns a node-set:
If a variable is created with an embedded statement, it returns a so called result tree fragment
<xsl:variable name="foo"> <xsl:copy-of select="/"> </xsl:variable>
Because XSLT allows more operations on node-sets, it is wise to use the select statement when possible instead of embedded statements. Otherwise, the node-set extension function would come to the rescue.
The node-set extension function exists for several processors: 4XSLT, Xalan-J, Saxon, jd.xslt, and libxslt, and you make it accessible to your stylesheets by including the namespace http://exslt.org/common.<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ext="http://exslt.org/common" version="1.0"> <xsl:variable name="all"> <xsl:copy-of select="child::*[1]"> </xsl:variable> <xsl:template select="/"> <root> <xsl:copy-of select="ext:node-set($all)"> </root> </xsl:template> </xsl:stylesheet>
Conclusion
Overall, as an occasional XSLT developer, try to keep the advantages of functional and flow-driven programming in mind—and be wary of falling into the trap of trying to use the procedural or imperative programming techniques that you commonly use in standard programming languages.