2002 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2002 Table of Contents


XML AS ENABLING TECHNOLOGY: EMERGING DEVELOPMENTS IN WEB ACCESSIBILITY

Kynn Bartlett ,Director, Accessible Web Authoring
HTML Writers Guild, Inc.
Resources and Education Center, HTML Writers Guild
110 E. Wilshire Avenue, Suite G-1
Fullerton, California 92832
(714) 526-5656
Fax (714) 526-4972
kynn@idyllmtn.com 
http://www.hwg.org/

WHAT IS XML?

Extensible Markup Language (XML) is not quite a language itself, but rather a set of rules and restrictions for constructing other languages: a metalanguage, or data format. Files created according to these rules can be processed by XML-aware programs, which means you can use similar tools and processes with any XML-based language.

XML is structured and hierarchical in nature, which means it looks like a tree of data, with child and parent nodes -- similar to a file structure on a hard drive. Like HTML, XML uses the concept of "tags" which surround text and define elements. Elements can have attributes, text values, and child elements.

The rules for defining an XML language are encoded in an XML Document Type Definition (DTD), or in an XML Schema. A DTD is a file which describes what types of elements and attributes constitute the language and what values they can take. It's interesting to note that a DTD (or Schema) is optional; as long as a document follows the rules of XML, it's an XML document even if the specifics of the language aren't fully defined.

Here is a simple example of an XML file:

   <?xml version="1.0"?>
   <family>

     <human sex="male">
       <name>Kynn</name>
       <age>33</age>
     </human>

     <human sex="female">
       <name>Liz</name>
       <age>unknown</age>
     </human>

     <dog sex="male">
       <name>Kim</name>
       <age>12</age>
     </dog>

     <dog sex="female">
       <name>Angie</name>
       <age>12</age>
     </dog>

     <dog sex="female">
       <name>Nying</name>
       <age>12</age>
     </dog>

   </family>

This file defines a list of family members -- two humans and three dogs -- and lists their genders, names, and ages. The tagging system used here should be familiar to anyone who has used HTML before, although the elements themselves are not HTML. In this case, this would be something like "Family Markup Language" and we may or may not have a formal DTD describing the language.

XML and HTML both come from the same source -- SGML, Standard Generalized Markup Language, which is also a metalanguage. XML is simpler and more restrictive than SGML. The "rules" for XML include:

  1. Documents must be "well formed" meaning they have a single root and the beginning and ending tags are properly nested.
  2. Characters such as < or >, which have special meaning in markup, must be "escaped" with special codes if they're used in text.
  3. Attribute values must be quoted, such as <dog sex="male"> -- <dog sex=male> is not proper XML.
  4. Element names are case-sensitive, meaning that <DOG> and <dog> are different elements (and need different closing tags).
  5. Closing tags are required on all elements; empty elements can be closed with a slash inside the opening tag, such as.

XHTML: HTML BY XML RULES

Extensible Hypertext Markup Language (XHTML) is simply HTML rewritten to conform to the rules given above. This means that instead of writing <BODY> or <Body>, authors writing in XHTML would use <body>. (Why lowercase? When the XHTML specification was written, the case-sensitive nature of XML meant that a standard way of writing HTML tags had to be chosen, and so it was arbitrarily decided that XHTML would be all lower-case. It wasn't quite a coin-toss, but it was close.)

XHTML is stricter than HTML, because of the rules listed above -- for example, you can't simply "forget" to close a <p> element. Also, all attribute values need to be quoted, and empty elements, such as <img>, <hr>, or <br> -- which don't "contain" any content -- have to be written with slashes: <img/>, <hr/>, and <br/>.

The tricky part about XHTML, however, is that it's not inherently backwards compatible with HTML. Tags like <hr/> confuse older HTML browsers, which don't understand the XML way of closing an element by putting a slash in the tag -- they think it's a tag called "aitch arr slash", which they ignore since it's not understood. However, by adding a space before the slash -- so it reads <hr /> ("aitch arr space slash") -- the HTML browser is pleased and displays the HTML naturally.

Here's a short and simple XHTML page:

   <?xml version="1.0"?>
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"DTD/xhtml1-transitional.dtd">
   <html xml:lang="en">
     <head>
       <title>This is my page</title>
     </head>
     <body>
       <p>
         This is my page!
       </p>
       <hr />
       <p>
         <a href="mailto:kynn@idyllmtn.com">Kynn</a>
       </p>
     </body>
   </html>

By itself, XHTML is not necessarily any more accessible than HTML; depending on how you create the page and what elements and attributes you use, you could create a highly accessible page, or a highly inaccessible page. The use of XHTML itself (or XML) does not automatically guarantee a page's accessibility.

XSLT: TRANSFORMING XML

Another XML-based language is Extensible Style Sheet Transformations, or XSLT for short. The name is a bit misleading, especially if you are used to Cascading Style Sheets (CSS) -- XSLT actually specifies a way to transform from one XML-based format to another. For example, you could write an XSLT transformation which changes from Family Markup Language to XHTML, as they both follow the rules of XML languages. In such as case, Family ML would be a "data language" that contains information on the specifics of the content's structure, while XHTML would be a "delivery language" sent to a browser for display to a user.

This type of transformation capability has a number of useful functions in web accessibility, specifically in the ability to separate content (data) from presentation (delivery).

While it's beyond the scope of this paper to describe the specifics of XSLT syntax, it's possible to define the types of transformations possible. XSLT allows for complete restructuring of the content, adding or removing content, selecting specific pieces or large portions of the data document and creating an entirely new delivery document. The use of differing stylesheets allows for that content to be transformed for any number of purposes, including specialized interfaces for users with specific requirements.

CC/PP AND ALTERNATE USER INTERFACES

Composite Capabilities/Preferences Profiles -- known as CC/PP -- is a W3C specification under development which allows for users to record information about the way their system can display or gather information, and then transmit that to a web server. For example, a CC/PP profile could contain a statement, "I prefer not to see images" or "I don't have a sound card in my computer."

A CC/PP-enabled server is then able to respond with an appropriate version of the web page, tailored to the user's stated needs and desires. For example, it could remove images and send textual equivalents, if the user requested "no images."

CC/PP was originally developed with the W3C by developers of cellular phones and programs that run on them, which need to know the physical characteristics of tiny displays in order to effect the best presentation for hundreds of different phone types. The same technology can also be used to deliver web pages to people with disabilities -- if they are willing to provide appropriate CC/PP profiles to those servers.

TWENTY-FIRST CENTURY ACCESSIBLE WEB SERVICES

Okay, so let's put it all together and describe a system by which XML, XSLT, XHTML, and CC/PP can work together to produce an accessible user interface for users with specific needs.

  1. Data on the server is stored in an XML format. This doesn't have to be any specific language; it could be something specific to this particular site and application, such as Family Markup Language, or it could be a generic "device independent markup language" such as Reef's CORAL language. It's important that any such language contain rich semantics so that appropriate transformations can be done -- for example, providing alternative text for images.
  2. XSLT transformation rules are written to convert the XML data into one or more delivery languages, such as XHTML, or WML (the XML-based language understood by cell phones).
  3. Each transformation is designed to meet the needs of specific user types -- for example, a transformation for a screenreader could remove images (replacing them with alternative text), re-order the content so the most important parts are first, and provide an initial table of contents for the page to give context. A user interface designed for someone with limited dexterity use would be quite different.
  4. When the user hits the site with a CC/PP-enabled browser, the user's profile is transmitted to the server, and the server selects the correct XSLT transformation, and applies it, and the resulting delivery document is sent back for display. Users without CC/PP capability are provided with a default, highly accessible version and given the option of registering (via cookie or session identifier) a profile on the server.

The accessibility benefit of this approach, which is a single source, multiple interface model (rather than the traditional single source, single interface model of earlier web design) is that it allows for each user to receive an optimal user interface -- one which is not merely "accessible" but also "usable." Rather than the screenreader version being a derivate of the graphical user's design, the screenreader user receives her own interface to the same content, made to work with her needs and preferences. Conflicts between accomodations necessary different disability types can be mitigated with such an approach, since different transformations can be used for different users.

REFERENCES

XML 1.0, W3C Recommendation, 10 February 1998, updated 6 October 2000 http://www.w3.org/TR/2000/REC-xml-20001006

XHTML 1.0, W3C Recommendation, 26 January 2000 http://www.w3.org/TR/xhtml1

XSL Transformations, W3C Recommendation, 16 November 1999 http://www.w3.org/TR/xslt

CC/PP Structure and Vocabularies, W3C Recommendation, 15 March 2001 http://www.w3.org/TR/CCPP-struct-vocab/

What is CC/PP?, Kynn Bartlett, 1999 http://www.ccpp.org/

Principles of Device Independence, W3C working draft, 19 September 2001 http://www.w3.org/TR/2001/WD-di-princ-20010918/


Go to previous article 
Go to next article 
Return to 2002 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.