2001 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2001 Table of Contents


Using XSLT to render accessible documents to the web

Dr. Carlos A. Velasco
GMD - German National Research Center for Information Technology
Institute for Applied Information Technology (FIT.HEB)
Schloß Birlinghoven, D53757 Sankt Augustin (Germany)
Carlos.Velasco-Nunez@gmd.de

CSUN 2001, 19-24/March/2001

Abstract:

This session aims to present an introduction to XML and its transformation language XSLT. It will discuss the characteristics of XML, how to incorporate accessibility requirements when defining XML vocabularies, and how XSLT properties can help to render accessible documents in the Internet.

1. Introduction

The landscape of new languages based upon the Extensible Markup Language (XML) [1] of the World Wide Web Consortium (W3C) is growing at a fast pace. The flexibility of these markup languages makes them specially suitable to develop applications and services for upcoming technologies.

With the great variety of applications and rendering devices foreseen, a key role will be played by the Transformations Language (XSLT) of the Extensible Stylesheet Language (XSL), which transforms XML into other formats. The importance of XSLT is due to the fact that HTML has a prevalent position over any new technology willing to take-over in the net, specially in the client side, where the adoption of XML by most of the available user agents is slow. From the accessibility standpoint, there are many elements to be considered. The paper aims to discuss issues related to the creation of accessible XML vocabularies, how to implement XSL Transformations, and some of the available tools.

2. XML and Accessibility

The Extensible Markup Language is a markup meta-language developed on the basis of the Standard Generalized Markup Language (SGML, ISO-8879) [2] by the World Wide Web Consortium to extend the capabilities of HTML, and of the web. Despite SGML is a widespread standard in the industry, and it is in use since the late-eighties providing strong capabilities in the area of document management, it was not developed with the Internet in mind. The flexibility of SGML was on the cost of its complexity, and the W3C sought for a simplification of SGML for web development.

As defined in the W3C recommendation, XML is a subset of SGML developed to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. Despite an initially slow take-up, the industry is adopting XML at an increasing pace. XML will shortly become a de-facto standard for e-commerce and distributed applications. There are several elements that make the future of XML promising:

It is well suited for data interchange between different organizations and individuals. It will represent the «end of proprietary file formats,» were the structure of your documents is decided by the vendor of the tool used in your organization. Even if the structure of the Document Type Definition (DTDs) is not available, XML documents can be read and exchanged, because documents are standard Unicode text files, and its well-formedness can be checked against the recommendation, which defines an unambiguous mechanism for constraining the structure. Thus XML bridges the gap between human-readable documents, and machine-readable documents, allowing a smooth and seamless interchange of information.

It allows to store the information in a hierarchical format. It must be stressed that XML is better adapted to exchange information with Object Databases, although they have not reached yet the maturity and spread of Relational Databases.

It enables the use of smart software agents, and provides to Internet and local search engines with meaningful elements to classify and sort documents with the assistance of the document markup and content. It also allows the employment of meta-data [3,4].

It has a strict and consistent syntax that makes documents easy to process. The need of a new meta-language to create new markup languages is justified by several problems presented by HTML: it is not extensible by the author (although browser manufacturers defined proprietary tags in the past); it is display-centric (a well-known accessibility hurdle); it is not directly reusable; it only provides one view of the data; it has little or no semantic structure; and it is not suitable for data exchange.

W3C tried to eliminate many of the accessibility barriers presented by HTML's display-centric elements by creating Cascading Style Sheets [5] and deprecating most of the presentational elements in the latest specification of HTML [6]. XML goes one step beyond because it allows to create languages with a clear distinction between content and presentation. This characteristic is emphasized in the latest draft of the document on XML accessibility published by the W3C [7]. The document outlines four general guidelines to develop accessible DTDs and accessible documents. It also defines an accessible document as a document that can be equally understood by its targeted audience regardless of the device used to access it.

From our point of view, these guidelines could be classified in two groups: those related to documentation access, and those related to design. Within the first group lies the need to export the semantics of the Document Type Definition (DTD) used. This implies to document it in an accessible way (HTML or text), publish the specifications in known repositories (Schema.net or XML.org) or make use of those already published by W3C and others. The rest of the recommendations lead to a set of practical techniques to implement the guidelines:

Identify clearly multimedia elements and its format. If the use of W3C recommendations (SMIL [8], SVG [9]) is not feasible, provide alternative content similarly to the tag of HTML or XHTML [10]. E.g.

<movie src="madrid.mpeg" mmedia="yes" type="video/x-mpeg2">
<desc captionSRC="madrid.xml" />
<!-- A full text caption -->
<alt> A documentary movie for tourists visiting Madrid. </alt>
<!-- A short text alternative -->
</movie>

Use meaningful names for elements and attributes to be understood even without the documentation: section, chapter, list, item, news, etc. Do not include presentation elements: fonts, colors, backgrounds, margins, etc. Define elements with ID attributes to ease navigation. Use XLink [11] when feasible. Define elements that can be grouped and indexed. For a set of related documents, define a hierarchical structure in an external document. Enable the use of style sheets (CSS2 or XSL + Transformations) and ensure that the full content is accessible independently of the rendering technique. Include language information (the attribute xml:lang can be defined IMPLIED). The problem nowadays is that user agents do not implement yet XML nor any of its derived languages. However, there is a language that can fill the gap: XSL Transformations (XSLT). We will describe the main characteristics of XSLT and how they will facilitate the implementation of accessibility guidelines.

3. XSL Transformations

XSL Transformations (XSLT) are described in the W3C recommendation [12]. The idea of transforming XML documents might seem awkward, but it is a very important feature for two reasons that benefit accessibility: separation between data and presentation, and seamless transmission of data between applications.

Concerning the first point, it must be stressed that the World Wide Web is becoming a complex environment where the number of web-sites needing interaction with databases is growing daily. XSL Transformations allow to store the three key elements: content, navigation and interface (i.e. presentation or style) in separate repositories, thus easing maintenance of e-commerce web-sites. The second reason for which transformations are important is the transmission of data between applications. This element might seem machine-centric, but in fact influences accessibility, because the prevalent position of HTML in the Internet demand a transformation from the original XML files to «usable» rendered HTML files.

The transformation process is twofold. First, the style sheet performs an structural transformation on the data to obtain a suitable version, and then the data are formatted. Originally, XSLT was part of the Extensible Stylesheet Language (XSL) [13], but in order to split both processes, XSLT became a separate recommendation whereas the style part became XSL:Formatting Objects (XSL:FO).

There are three characteristics of XSLT that promote the creation of accessible documents:

It allows different processing modes for the document: table of contents, document and index; It allows dynamic document rendering, depending on user profiles and preferences (this feature will be of importance when using Composite Capabilities/Preference Profiles [14] in the near future); It is template based, and allows the reuse of code due to its modularity. Furthermore, different vocabularies can coexist because different namespaces [15] can be defined.

3.1 Web-site structural elements

As mentioned in the introduction, XSLT can help to implement accessibility elements within new XML DTDs. To attain this objective, it is recommended to split the web-site in three elements (Figure 1):

Actual page content, i.e. the main content of the page to be rendered; Navigational elements, i.e. the interface elements that allow the user to access different sections of the web-site. For this purpose, the tree-structure of XML documents is specially well-suited to define web-site structures; and Miscellaneous elements, i.e. banners, news, and other accessory elements in the site.

Figure 1: A typical page structure for a commercial web-site.

XSLT is based upon the syntax of another W3C recommendation, XPath [16], a language for addressing parts of an XML document. XPath operates on the abstract, logical structure of an XML document, to specify the locations of document structures or data found in an XML document, and its a key element to define navigational elements.

These elements shall not contain any presentation features. Figure 1 presents a typical layout, but that description shall not be included in them. The double-step transformation process (see Figure 2) consists in merging this information by a XSLT processor:

Transformation process: multiple input documents are mixed by XSLT inclusion mechanisms (via the document() function or style sheet inclusion by <xsl:include> or <xsl:import>). The transformation process could include selection, aggregation, grouping, sorting and arithmetical operations. The objective is to create the required structure for the document to be rendered. Rendering process: style is applied to the resulting document to be rendered in the user agent. This process includes presentation and layout elements. XSLT supports to render a XML document, a HTML document or a text document. The W3C recommendation does not support multiple output, but some processors and publishing frameworks do.
It must be finally remarked that when rendering HTML documents, a well designed DTD allows an smooth implementation of the Web Content Accessibility Guidelines [17].

Figure 2: The double transformation process of an XML document by XSLT.

3.2 Publishing Frameworks

The previous sections have emphasized the powerful tools that XSLT puts on the hands of web designers. However, there is a part of the equation missing. The use of XSLT demands of web-servers to respond to document requests not with the document itself, but with a «published» version of the document. Furthermore, commercial web-sites store the information in database repositories, and different web-sites components are dynamically retrieved. Therefore, web-servers have increasing performance demands to respond to these overheads.

The selection of a publishing framework is a difficult task. In XMLsoftware.com there are a set of available publishing tools. It will be noticed that most of them are in a beta state or in its first version and therefore they are not very stable. Another problem that can be identified is that some of them do not follow the latest version of the W3C recommendations, or are not compatible with other XML tools and APIs.

There are tools available that perform transformations off-line, although these do not have any interest outside research environments. Publishing frameworks are characterized by the fact that they realize dynamically their transformations. The most reliable choice is Cocoon, a set of Open Source parsing tools that run on any web-server able to handle Java Servlets. Cocoon is developed by the Apache foundation within a project funded by Stephano Mazzocchi. It has three components: Xerces, an XML parser in Java or C++; Xalan, an XSLT style sheet processor in Java or C++; and Apache FOP (Formatting Objects Processor). On top of them Cocoon created its own Java technology to build web applications with XML content: XSP. It allows also the use of different style sheets depending on the user agent, leaving up to the designer how to render the document.

It must be stressed that these tools do not perform document validation because of the unnecessary overhead, thus they require only well-formed documents. This alleviates also the designers workload, although the publication of XML Schema [18] simplifies the creation of new DTDs.

4. Conclusions

In this paper we have reviewed accessibility issues concerning XML and its transformation language XSLT. As it happens with any new technology, it can boost accessibility to new online applications and services when properly used, or it can pose unsurmountable hurdles for people with special needs when used without considering their needs. From the previous sections, we can conclude that these technologies:

Bibliography

Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation (1998). Bray T, Paoli J, Sperberg-McQueen C M and Maler E (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO). http://www.w3.org/TR/xml-rec
Goldfard F C (1991), The SGML Handbook. Oxford University Press.
Resource Definition Framework (RDF) Model and Syntax Specification (1999). Lassila O and Swick R R (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/REC-rdf-syntax/
Dublin Core Metadata Element Set, Version 1.1: Reference Description (1999). Dublin Core Metadata. http://purl.org/DC/documents/rec-dces-19990702.htm
Weibel S, Kunze J, Lagoze C and Wolf M (1998), Dublin Core Metadata for Resource Discovery, Internet RFC 2413. The Internet Society.
http://www.ietf.org/rfc/rfc2413.txt
Cascading Style Sheets, level 2. CSS2 Specification. W3C Recommendation (1998). Bos B, Lie H W, Lilley C and Jacobs I (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/REC-CSS2
HTML 4.01 Specification, W3C Recommendation (1999). Raggett D, Le Hors A and Jacobs I (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/html401
XML Accessibility Guidelines (2000). Dardallier D (Ed.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/WAI/PF/xmlgl.htm
Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, W3C Recommendation (1998). Hoschka P (Ed.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/REC-smil/
Scalable Vector Graphics (SVG) 1.0 Specification, W3C Candidate Recommendation (2000). Ferraiolo J (Ed.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/SVG/
XHTML 1.0: The Extensible HyperText Markup Language. A Reformulation of HTML 4 in XML 1.0. W3C Recommendation (2000). Pemberton S et al (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/xhtml1/
XML Linking Language (XLink) Version 1.0, W3C Proposed Recommendation (2000). DeRose S, Maler E and Orchard D (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/xlink/
XSL Transformations (XSLT) Version 1.0, W3C Recommendation (1999), Clark J (Ed.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/xslt/
Adler S, Berglund A, Caruso J, Deach S, Grosso P, Gutentag E, Milowski A, Parnell S, Richman J and Zilles S (2000), Extensible Stylesheet Language (XSL) Version 1.0, W3C Candidate Recommendation.
World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/xsl/
Composite Capabilities/Preference Profiles: Requirements and Architecture, W3C Working Draft (2000).
Nilsson M, Hjelm J and Ohto H (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/CCPP-ra/
Namespaces in XML, W3C Recommendation (1999), Bray T, Hollander D and Layman A (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/REC-xml-names/
XML Path Language (XPath) Version 1.0, W3C Recommendation (1999), Clark J and DeRose S (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/xpath/
Web Content Accessibility Guidelines 1.0, W3C Recommendation (1999). Chisholm D, Vanderheiden G and Jacobs I (Eds.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/WAI-WEBCONTENT
XML Schema Part 0: Primer, W3C Candidate Recommendation (2000). Fallside D C (Ed.), World Wide Web Consortium (MIT, INRIA, KEIO).
http://www.w3.org/TR/xmlschema-0/



Go to previous article 
Go to next article 
Return to 2001 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.