2001 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2001 Table of Contents


Taras Kowaliw
Adaptive Technology Resource Centre, University of Toronto
E-mail: taras.kowaliw@utoronto.ca


This paper begins with a summary of the problems associated with the communication of mathematics in an auditory modality. Some solutions to these problems are presented, leading to a discussion of the Mathematical Audio Browser (MAB), an application currently being developed at the Adaptive Technology Resource Centre (ATRC) designed to implement those solutions. The result promises to be a robust application for the presentation of mathematical web pages (XHTML and MathML documents), supporting the features needed for effective communication within the Screen Reading paradigm.

Mathematicians rarely just talk about their work; More commonly they are found poised and ready around a blackboard, or hovering about the closest outcropping bemoaning the poor quality of cocktail napkins. Mathematics is a primarily visual endeavor; The addition of natural language is an afterthought, and mostly an incomplete afterthought. Indeed, this is a difficult matter for accessibility - How can one effectively communicate a primarily visual matter to the visually disabled?

Mathematics on the Web...

Currently, mathematical documents (and mathematically robust technical documents in general) are generally encoded in TeX, a technical typesetting language. TeX, however, does not live within the Internet paradigm - As the Internet grows as a pedagogical tool, so does the need for mathematical notation to be embedded as an element of a standard web document.

The solutions currently used are not palatable to the user with visual disabilities. The inclusion of GIF images (screen shots taken from some mathematical authoring tool) is most often entirely inaccessible, with the rare exception that the author includes some meaningful ALT text, in which case the description's linguistic conventions change from author to author. The linking of documents encoded in Postscript is less accessible still, where no ALT text can possibly be included. Luckily, these options are not palatable to visual users either, due to the need for additional software, or simply the awkwardness of the presentation.

The most likely replacement for these approaches is MathML, the W3C's new XML language for mathematical expressions. MathML is a fully robust markup for mathematical expressions, capable of visually presenting virtually any mathematical notation. It also promises full support within the near future, evident from the claims of the producers of popular browsers (Indeed, Netscape's beta browser Mozilla already includes a near-complete implementation). Given that MathML promises to soon be the standard for mathematical notation on the web, accessibility becomes a crucial matter. Accessibility of the content should be ensured prior to MathML becoming a commonly used tool in education and academics.

Current Status of MathML Accessibility...

The general XML philosophy can be summarized as "the separation of content from presentation" - from the viewpoint of accessibility, this is an exciting idea. In this separation, the possibility exists that content can be easily obtained and presented in a manner suitable for speech output, Haptics, etc. "The W3C's commitment to lead the Web to its full potential includes promoting a high degree of usability for people with disabilities"5, unfortunately, is not easily extended to MathML. The visual nature of mathematical notation has made it too difficult to distinguish between content and presentation. While it may be possible for some simple cases, like single-variable calculus where notation is centuries old and common, modern mathematicians often invent their own notation for their immediate needs.

Current Screen Readers are simply not designed to deal with this sort of input. Consider the following scenario: A visually disabled user is accessing a web page using a standard screen reader, and a popular browser. Within that page, the author has included an equation, say: GIF image of The integral from minus infinity to plus infinity of e raised negative x squared over 2

The MathML is detected, and presented visually. The screen reader comes across the visual representation, and reads those symbols which it recognizes: "plus minus x 2 e 2 dx minus", i.e. a linear reading of all the common symbols. Unfortunate as this is, it is the reality of the situation. There is no commonly available tool at present for presenting MathML accessibly(*).

The Challenges for Accessible Mathematics...

The difficulties in presenting mathematics in an auditory modality are daunting. Putting aside the need for the creation of a mostly new natural language notation, which agrees with the existing incomplete one, one quickly discovers challenges:

Two-Dimensional Formatting: Mathematical notation is designed for visual elegance. In doing so, mathematicians use a two-dimensional notation. This includes the extensive use of superscripts and subscripts, elements layered on top of each other, hats and arrows, etc. This sort of problem becomes immediately visible when one attempts to read a two-dimensional matrix aloud. Consider the equation given above: a non-ambiguous reading would need to be something along the lines of "The integral from minus infinity to positive infinity of begin-scope e to the power of begin-scope the fraction numerator negative x raised 2 denominator 2 end-scope dx end-scope". In these sorts of situations, the content of the expression is quickly obscured by the sheer mass of formatting information.

Separation of Content and Presentation: Although MathML contains a content markup, a robust reading mechanism cannot assume its use; The need to read expressions which lie outside of the content markup scope, and the likely possibility that MathML authoring tools will produce presentation markup when not strictly necessary guarantee this. But, presentation markup is widely ambiguous: For example, f(a+b) might mean "f multiplied by a plus b", or it might mean "the function f applied to a plus b". Z/2 might mean "one-half zed", or it might mean "The integers modulo 2". In general, with no assumptions about context, an automated process can do no better than simply read a mathematical expression symbol by symbol. Consider the first example again: "Elongated s superscript (plus infinity-sign) subscript (minus infinity-sign) e superscript ((minus x superscript 2) over 2) d x". Again the length of the expression obscures its content, and fails to agree with existing conventions for mathematical speech.

Active vs. Passive Access: When a user visually examines a mathematical expression, he possesses a great deal of control over his attention. He may first scan the entire equation, then focus on whichever components he wishes. However, the user listening to a speech synthesizer does not have this level of control. Stevens and Edwards note that "Listening is essentially a passive process and users of taped books often complain of lapses in concentration and an overwhelming amount of information to try to remember... Reading often defaults to a passive reception of information at a pace dictated by speech"1. Raman writes that "The speaker (perhaps on an audio cassette) reads in a relentless linear fashion, from beginning to end, and the listener simply listens, with little control over the process"2. Mathematical equations are notorious for reaching great length and complexity - a user needs the ability to easily reread an expression as many times as he desires, and the option of breaking down that equation into "chunks" - that is, to have the ability to analyze the surface structure of an expression, then delve into the details of the components.

The Solutions...

Solutions for these challenges do exist, and in fact have been implemented. T.V. Raman's ASTER3 contains all of the following features, and MathTalk1 implements some of the prosody and browsing utilities. These systems are designed for TeX, however, and do not readily lend themselves to the Web Browser-Screen Reader paradigm. The conventions implemented within these systems, however, will be emulated in the ATRC Mathematical Audio Browser

Audio Formatted Natural Language: The problem of emulating two-dimensional notation in a linear format may be approached by artificially adding another dimension to the audio output. By controlling the prosody (pitch, rate of speech, etc.), different "modes" of notation may be encoded without explicit linguistic indicators. For example, superscripts may be spoken in a higher pitch, subscripts lower. Consider the following example:

Image of expression e raised minus x plus y raised 2 z. Select to hear audio

Selecting the image will play a proposed audio output for that expression, formatted according to the preceding rules. Note that explicit indications for scope were not necessary. Stevens and Edwards, in discussion of MathTalk, note that "the effects of prosody were... evaluated experimentally and shown to promote the recovery of syntactic structure, enhance the retention of lexical content and reduce the mental workload involved".

User Context Selection: The separation of content and presentation problem may be dealt with using a mechanism for user context selection: That is, a process by which pattern-matching rules may be added to the system. These pattern-matching rules would find common occurrences and replace them with meaningful short-cuts. For example, Z/2 with no assumptions must be read "Z over 2". If the user realizes that the given context of a paper is arithmetic, he might load a set of arithmetic rules, which might contain the rule */2 -> "one-half *". At this point, the system would read "one-half z", leading to a more immediate recognition. The need exists for pre-defined context rule sets for all the major fields of mathematics, as well as the inclusion of a routine for user-defined rule sets. Through this means, a user could respond to a new or unsupported field of mathematics by creating his own set of rules, with which he could hear a set of documents.

Document Browsing: While audio formatting might aid in the grasping of syntactic structure, it is not sufficient to introduce a robust mechanism for the processing of mathematical notation. The need to return control over the pace and order of the reading is essential - one way to accomplish this is through a "document Tokenizer", that is, a mechanism for breaking up a document into smaller and logically ordered chunks, more simply processed by a listener. In adding the ability to browse those tokens in any order, the listener is transferred into an active role, controlling which elements are re-read and which are skipped.

The ATRC's Mathematical Audio Browser...

Let us reconsider the initial scenario. The user is browsing a web page containing a MathML expression, and she is alerted to it's existence (either through some continuously present JavaScript which alerts her, or simply through hearing the Screen Reader read something like "And we use the expression: x x y 3 minus f a"). She launches her local MAB application, and copies the URL from her browser into the MAB URL field. At this point:

At this point our user may activate the tokens (i.e. command them to be read), one by one, or ask that the entire document be read. She may also reread tokens, cycle forward or backward through those tokens, or load some new (pre-defined or immediately created) context to deal with the MathML.

Through the parsing of the XHTML document, MAB hopes to provide some rudimentary active listening to the user, returning some of the control lost in the use of a passive medium. The use of audio formatting will recreate the visual elegance lost in conversion to linear form. Inclusion of context-selection routines can provide the user means of escaping the presentation markup, both through standard included sets of rules, and through user-defined sets tailored for particular cases.

About MAB...

MAB consists of three essential components: The MAB User Interface (MUI), the XHTML/MathML Tokenizer, and the Speech Synthesizer. It is believed that all of these components may be made accessible and cross-platform portable.

The MUI will be written as a simple window-style application using Java Swing. The use of Swing will allow for the MUI to be cross-platform, as well as for a standard accessible access to all it's controls. Given the relatively small control-set, all controls will be placed within one cohesive frame to ease operational complexity. MAB's output will be entirely auditory, including feedback for MUI events - a low vision user may still correlate the audio output with a visual display by running the visual browser in the background of the MUI. Given a fairly intuitive methodology for tokenizing the XHTML document, this should prove sufficient for their needs.

The XHTML/MathML Tokenizer will incorporate the Apache Foundation's Xalan XML Parser(c), using it's DOM representation to facilitate it's functionality. The DOM will first be used to apply pre-defined and user-defined context rules, then for a translation of the resulting MathML and XHTML elements to Java Speech Markup Language (JSML). This component will be broken into two distinct modules: the XHTML to JSML module, and the MathML to JSML module. The purpose of this is to allow the recycling of both in other related software packages, allowing for a standardized set of transformation rules.

In order to synthesize speech with all the appropriate prosody features, MAB will use the JSML in conjunction with the Java Speech API (JSAPI). For the purposes of the prototype, IBM ViaVoice's (TM) implementation will be used. Given that implementations exist on virtually every platform, portability of the product should not be compromised.

MAB is currently under development at the ATRC, a prototype being expected in January 2001. The source code for all components will be available freely and publicly.

Future Prospects...

MAB has a great deal of potential for growth beyond the plans for it's initial prototype. The first and most obvious extension is a more detailed break-down of mathematical tokens, to allow for the "skimming of surface structure". A second extension involves a more detailed set of standardized rules for reading mathematics: Since the MAB source will be publicly released, and since it is written is a cross-platform and modular language, re-using the specific component for the translation of MathML to JSML in other software packages will be easily accomplished, and strongly encouraged. A third potential, related to the second, is for the adaptation of the MathML to JSML component to a MathML to CSS2 Audio Formatted XHTML component. This could potentially allow the translation to be included directly within a browser, rather than requiring a secondary application.

(*) - Some work is currently underway. MAVIS is currently developing a MathML to Nemeth Braille code converter4


  1. Edwards, A., Stevens, R. Mathtalk: Usable Access to Mathematics, (University of York; 1994; http://www.rit.edu/~easi/itd/itdv01n4/article5.html)
  2. Hayes, B. Speaking of Mathematics, (American Scientist; March-April 1996)
  3. Raman, T.V. Audio System for Technical Readings, (PhD. Thesis, Cornell University; 1994; http://www.cs.cornell.edu/home/raman)

    Web References:
  4. Scientific Notebook to Nemeth Code Converter (c), (21.10.98; http://www.nmsu.edu/~mavis/converter.html)
  5. Web Accessibility Initiative (WAI) Home Page, (22.09.00; http://www.w3.org/WAI)
  6. W3C Math Home Page, (02.06.00; http://www.w3.org/Math)
The author would like to thank: Network Ontario's Telecommunication Access Partnerships (Ontario Ministry of Energy, Science and Technology) for their funding contribution for the ATRC NIDE project.For further information regarding the NIDE project, please visit http://nide.snow.utoronto.ca/

Go to previous article 
Go to next article 
Return to 2001 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.