2001 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2001 Table of Contents

Translation on Demand Anytime and Anywhere

Gottfried Zimmermann, Ph.D.

Gregg Vanderheiden, Ph.D.
Trace R&D Center
5901 Research Park Blvd.
Madison, WI 53719


In face-to-face conversations people with functional limitations and their interlocutors may encounter communication problems if there is no assistance available. A similar problem arises when the dialog partner is a computer application or web site rather than a human being and its user interface lacks full accessibility. Whether in face-to-face communications or in front of a computer, an instant translation or interpreter service, available anytime and anywhere, would help to bridge the communication gap between the user with special needs and his communication partner.

This paper describes principles and possible applications of a network-based "translation on demand" service. Hereby it concentrates on services for deaf and hard-of-hearing persons (text-captioning on demand, signing on demand) and a service for blind and visually impaired persons (description on demand).


The human society heavily relies on communication. This applies to all kind of conversations, whether it is a personal conversations, a presentation in a business meeting or a human-computer dialog. If a communication partner has a functional limitation that prevents him from gaining full access to the provided information there should be a translation from one communication mode to another. Thus for deaf or hard-of-hearing persons audio content must be translated to text or sign language (and often vice versa). Vision-impaired participants may need a description of a picture or any visual object that is essential in a particular context.

People with functional limitations often experience wide communication gaps in everyday's situations. Consider for example a hearing-impaired participant in a meeting where information is mostly exchanged auditorially. However, a sign language interpreter (or a verbal description for a visual object) is not always available. In the same meeting there might be a blind person who has no access to the visual diagram somebody brought in and which is now in the center of the discussion if nobody describes the diagram verbally as the debate goes on. Both, the deaf and blind participants in this example, are precluded from fully participating in this meeting because of their functional limitations.

A similar problem faced in the emerging "information society" stems from the fact that we are more and more dependent on having unrestricted access to online information provided by a public information network. Guidelines help in creating accessible web sites and application interfaces for cross-disability access. However, not all web sites and applications may conform to these guidelines for a number of reasons. And there is no "guideline solution" for dynamic content, e.g. live images provided by web cams or live audio streams in distributed collaborative environments.

Translation on Demand

Basically, an "assistant on demand" is an individual who could be called up to assist someone with a disability anytime they required it, but who would not be around the rest of the time. The concept of "translation on demand" provides a network-based translation service available anytime and anywhere and may be human or computer based (or both) [Vanderheiden, 1995] [Vanderheiden, in press]. This paper concentrates on three different types of translation on demand services provided for different user groups: text-captioning on demand, signing on demand and description on demand.

Text-Captioning on Demand

Text-captioning on demand instantly translates speech to text for deaf or hard-of-hearing people.

Today's speech-recognition software achieves reasonable recognition rates only with a restricted vocabulary or with speakers for whom the system was trained before. For many everyday-situations where arbitrary people are talking together this is not applicable. But speech-recognition software still can help in getting an accurate speech-to-text translation if a dedicated speaker repeats everything that has been said by other people. In fact, this technique works well even if the dedicated speaker is in remote location and only connected to the other stakeholders via a wide-area network. The network can convey acoustic information in one direction and text-based information in the other direction. This text is then displayed on a screen, hand-held display or special eye glasses (e.g. the Personal Captioner [PCS]).

Another technique for a text-captioning service is a trained person typing on a stenographic keyboard connected to a computer (e.g. the National Captioning Institute [NCI]). This person could be remotely connected to the customer in the same manner as described above.

Signing on Demand

Signing on demand is a remote service accessed via wide-area network. An audio stream from the point-of-need location can be sent to a remote human sign interpreter. On return, a video stream showing the interpreter's signing is sent over the network and posted on a screen or hand-held device for the hearing-impaired person(s). In addition, a video stream from the location to the remote interpreter is needed for sign-to-speech translation. For this direction the interpreter's speech is transferred to the requesting location by an audio stream. An accompanying video stream from the location to the remote service may also help the sign interpreter in getting valuable context information for the translation service.

There are several reasons why sign language translation may be preferable to text translation for certain situations. 1) Sign language (particularly American Sign Language) can express information that is conveyed by speech but not codable in plain text (e.g. emphasis or timing of spoken words). 2) A deaf or hard-of-hearing person may wish to actively take part in a live spoken conversation using sign rather than text. Thus it is more natural to recognize signing and respond in signs than read text and respond in signs. 3) Moreover not all hearing-impaired people can read and understand English at conversational speeds. As a conclusion, speech-to-sign translation is an appropriate means where information exchange mainly relies on spoken language or where hearing-impaired participants lack sufficient reading skills. Signing on demand should provide both translation directions, speech-to-sign and sign-to-speech.

With emerging speech and image recognition, machine-translation and avatar (computer-generated human-like character) rendering techniques this service might be provided in a fully automatic mode. However it might remain remote because of the sheer need of enormous computing power. Thus an audio stream would be transferred to a remote service application which in return would send movement commands for a signing avatar being rendered on a screen or hand-held display. In the other direction the video image of a signing person would be analyzed by the remote service. Thus the sign language would be translated to synthesized speech and sent back as an audio stream to the requesting location.

Description on Demand

Description on demand translates visual information into verbal form for blind and visually impaired persons where no other verbal description is available.

This service is provided by a person remotely connected to the requesting person. Therefore a video stream (or image) must be provided to the service showing the visual environment or object to be described. The verbal description delivered by the service personnel is then sent over the network and brought to the requestor by speakers, headphones or earbud.

Variations of Application

There is an almost infinite number of possible applications for the translation on demand service. This section will only briefly mention some highlights in order to describe the potential of this service.

Remote Signing for Public Events

At the SuperComputing 99 conference, which was held at Portland's Oregon Convention Center in November 1999, the Trace R&D Center [Trace] demonstrated the feasibility of real-time sign language translation over a high-speed internet for a wide audience [Barnicle et al., 2000]. The audio stream of the plenary session was sent to the remote Trace Center in Madison, Wisconsin, where human sign interpreters provided an instant translation service for the conference. The signing was captured by video and sent back over Internet II to the conference location where it was rendered on a large screen in front of the room.

In this manner a cost-effective high-quality sign language translation service can be provided anytime and anywhere via a high-bandwidth global network

Text-Captioning via Special Eye Glasses

Special eye glasses with a monitor built-in from Personal Captioning Systems [PCS] provide discreet personal captioning. The text captions are provided through wireless transmission and the words seem to "float" about 18 inches in front of the eye. This system can be used in community and social activity locations where an audio signal is easily available (e.g. theatres, movie theatres, conferences venues).

Moreover, in a hypothetical scenario, a hearing-impaired person could use this system in a more personal manner during conversations or meetings with hearing people. The person could hold a pen-like device with a built-in microphone to wirelessly feed a remote service with the environmental sound [Vanderheiden, 1995]. Any spoken word contained in the sound would then be translated into text and instantly shown on his eye glasses. This device would provide individuals who are deaf with the ability to carry on face-to-face conversation with anyone else who might be talking to them. However this scenario presumes that the hearing-impaired person himself has the capability to speak.

As of September 2000, a new service called "Instant Captioning" was announced by Ultratec. Current, in field testing, this will provide a text translation capability using different technology formats [Ultratec].

Remote Description via Earbud

For blind or visually impaired people it would be a great help if they could call for translation on demand anytime and anywhere. In a hypothetical scenario, if a blind individual became disoriented in an unknown environment, a small wireless camera, built into his cane, could transfer a 360 degree view of the current environment to a description service. A verbal description of the environment would then be sent to the earbud worn by the blind person. Or with the same camera the service could deliver a spoken description of a slide presentation in a business meeting.

Remote Signing in a Collaborative Environment

A collaborative environment is a network of hardware and software systems supporting distributed teams' conversation and collaboration. As an example such a network can facilitate a common meeting of different teams at different locations sharing an electronic white-board and electronic documents.

Collaborative environments are an ideal configuration for a remote sign language translation service. With their network of (wirelessly) connected devices (video cams, large screens, microphones, speakers, electronic white-boards) they offer a complete platform for audio and video capture and rendering. In this scenario the remote location of the sign interpreter is just another node in the collaborative environment and audio and video signals are passed back and forth to this node. The video image of the sign interpreter can be displayed in a section of the large display or, more discreetly, on a personal laptop or hand-held device, wirelessly connected to the underlying network.

This scenario allows a hearing-impaired participant for real-time sign language conversation in both directions. Thus the remote sign interpreter translates other participants' speech to sign and the hearing-impaired participant's signing to speech.

In the future a signing avatar as part of the collaborative environment system could automatically translate spoken content into sign language on demand. Also a video-based sign language recognition system could transform signing from a hearing-impaired participant into audio for hearing participants.

Built-in Translation on Demand for Computational User Interfaces

A functionally impaired computer user from time to time stumbles over (partly or totally) inaccessible user interfaces and web sites. This does not necessarily mean that the developers have been thoughtless or even ignorant. It might also be caused by the mere dynamic nature of the provided content. In this case it would be nice for the user to press a dedicated button (or speak a special command) to launch a text translation, sign translation or video description application. This application would connect his computer to a remote location where the appropriate interpretation service would be provided with or without human assistance.

Thus a text translation service could capture the audio stream of the requesting user's computer and display its content in text form in a separate window. Or the signing service would open a window showing a sign interpreter (or a signing avatar) translating the content of the computer's audio signal into signs. Or the description service would provide a verbal (spoken) description of inaccessible visual content, e.g. diagrams, images and videos.

Built into all browsers and into every operating system's graphical user interface this service button could be the anchor to a powerful "safety net" for unexpected "accessibility crashs". The Trace R&D Center currently works toward such a globally available translation on demand service for the Grid, the next-generation high-speed Internet matrix of services. Being part of the Partnership for Advanced Computational Infrastructure [PACI] funded by the National Science Foundation (NSF), Trace aims to harness today's and tomorrow's high-tech solutions for a globally available translation service on the Grid.


[Barnicle et al., 2000] Barnicle, Kitch; Vanderheiden, Gregg; Gilman, Al. "On-Demand" Remote Sign Language Interpretation. In: 9th International World Wide Web Conference, Poster Proceedings, Amsterdam, May 15 - 19, 2000. http://www9.org/final-posters/poster26.html.

[NCI] National Captioning Institute, Inc. http://www.ncicap.org/.

[PACI] Partnership for Advanced Computational Infrastructure, Directorate for Computer and Information Science and Engineering, National Science Foundation.

[PCS] Personal Captioning Systems.

[TRACE] Trace R&D Center, University of Wisconsin-Madison. http://trace.wisc.edu/

[Ultratec] Ultratec, Inc. http://www.ultratec.com/

[Vanderheiden, 1992] Vanderheiden, Gregg. A brief look at technology and mental retardation in the 21st century. In L. Rowitz (Ed.), Mental Retardation in the Year 2000 (pp. 268-278). New York, NY: Springer-Verlag New York, Inc., 1992.

[Vanderheiden, 1995] Vanderheiden, Gregg. Access to global information infrastructure (GII) and next-generation information systems. In A. Weisel, Ed., Proceedings of the 18th International Congress on Education of the Deaf - 1995. Tel Aviv, Israel: Ramot Publications - Tel Aviv University, 1995.

[Vanderheiden, in press] Vanderheiden, Gregg. Telecommunications -- Accessibility and Future Directions. In: Julio Abascal, Colette Nicolle (eds.), Inclusive Guidelines for HCI. Taylor & Francis Ltd., in press.

Go to previous article 
Go to next article 
Return to 2001 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.