2002 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2002 Table of Contents


WEB CONTENT TRANSCODING FOR VOICE OUTPUT

Hironobu Takagi and Chieko Asakawa
takagih@jp.ibm.com and chie@jp.ibm.com 
IBM Japan Ltd., Tokyo Research Laboratory
1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242-8502, Japan

INTRODUCTION

These days, fierce competition exists between major web sites, and the Web authors on all sites have been trying to make their pages visually attractive. This authoring trend causes serious difficulties for blind people who want to access the Web non-visually.

For example, various types of content tend to be combined into a single page, and advertisements are scattered everywhere on the page. Sighted users can intuitively recognize these chunks, since they are represented using various types of visual effects, such as font and background colors, spacing, font size, and images. In other words, pages are fragmented into visual groupings. There are no aural delimiters for these groupings, and therefore non-visual users are hard put to recognize each chunk and understand the content. We call this as the "page fragmentation problem".

Of course, other known problems exist, such as missing ALT text, missing skip-to-main content links and so on. In order to solve all of these difficulties, we decided to develop a Web content transcoding system. In this presentation, we will mainly describe ways how blind users can easily navigate through our transcoded pages with voice browsers [1] and screen readers [2] as well as describe our transcoding methods. We will also discuss how to use our system in non-visual environments such as via telephone or while driving a car.

TRANSCODING OF WEB CONTENT

Fragmentation of a page based on visual groupings

To solve the above-mentioned fragmentation problem, our system has the ability to represent visually fragmented groupings non-visually in three ways. First, it rearranges the groupings in a page based on roles and importance to allow users to access the content based on its importance. Second, it inserts delimiter text at each boundary of the groupings, such as "Group 5 Today's headline". Third, it inserts "page indexes" at the bottom of the page to allow users to access each group directly. Figure 1 shows an example of a transcoded page. We have adopted this representation method for VoiceXML conversion, too. This will be explained below.

Figure 1 An Example of a Transcoded Page

For this rearrangement, the system needs to have a layout description of each target page. We call the layout description "annotation" [3]. Of course, it would be impossible to have annotations for every page, and therefore, we have been trying to drastically reduce the annotation authoring time, by developing various methods and programs to implement those methods as follows: - A dynamic annotation matching method
- A semi-automatic error correction method
- An automatic layout classification method
- A high-speed annotation matching algorithm
- A Site Pattern Analyzer integrating all of these methods

Using these technologies, we can create annotations for a large news site within a week.

Other methods

We integrated various kinds of transcoding features into this system. Automatic insertion of ALT text is a method to provide missing ALT text for image links and image maps by using the title of the destination page as the ALT text. Automatic insertion of the "skip to main content" link is a function to insert the link at the top of page by locating the position of the main content in the page based on some heuristic rules.

WEB CONTENT TRANSCODING SYSTEM

Transcoding generally means conversion of a formatted document to another@ document format. Figure 2 shows the system architecture of our transcoding system. We developed the system as a plug-in for WebSphere Transcoding Publisher [4]. The system works as an intermediary between a Web server and a client. When a user accesses a Web page through the system, it redirects the request to the target server, downloads the target Web page, converts it, and then sends it to the user. A user does not need to be aware of our system. A user can navigate through web pages in the usual way.

Figure 2 System Architecture

For voice access via telephone, a telephony server (WebSphere Voice Server [5]) works in cooperation with the transcoding system. This system converts Web pages into VoiceXML-format documents. VoiceXML is a standardized format for general telephony systems based on XML technology. The telephony server receives the reformatted VoiceXML documents, interprets them, and then controls the interaction with the users.

TRANSCODING FEATURES FOR VOICE BROWSERS AND SCREEN READERS

For voice browsers and screen reader users, our system works just like as an ordinary Web site. This means that users do not need to install any special software or change any settings of their environment. A user can access any pages through our system by simply adding a prefix, such as "http://www.trans.ibm.com/", before the URL of the target page, as follows:

Example 1)
Target's URL: http://news.lycos.co.jp/ 
Through transcoding system: http://www.trans.ibm.com/news.lycos.co.jp/

Example 2)
Target's URL: http://www.ibm.com/news/us/2001/09/06.html 
Through transcoding system:
http://www.trans.ibm.com/www.ibm.com/news/us/2001/09/06.html

Users do not have to worry about the URL modification after accessing a page through the system, since all links on displayed target pages will automatically be rewritten to pass through the transcoding system.

Here is a more detailed description of the navigation steps for a transcoded Web page, using the Washington Post Web site (http://www.washingtonpost.com/) as the example. In the original page, a header, index lists, weather information, and other items appear above the day's top news stories. Using a voice browser (Homepage Reader), it takes about 80 seconds of listening to get from the top of the page to the beginning of the main content. Even if users use the non-link text jump key of Jaws, it requires pressing that key 9 times to get to the main content. However, even this method cannot find the exact location. Users usually try to get directly to the main content by skipping some information, so they tend to miss important information that appears before the main content.

Our transcoded page provides a page index at the bottom of a page to give an overview of the page, and allows users to directly jump to each group of items. Here is the first part of such an index:
There are 17 groups on this page
1. Updated time
2. Headline
3. America Attacked
4. Today's photo
5. What you need to know
6. Other news
7. Weather

If "Headline" is selected, following content will be read: Saudi Arabia Balks at Use of U.S. Base for Strikes Unless Powell can persuade officials, the need to develop alternatives would delay a campaign for weeks.

After the content of this group is read, the next delimiter would be read as "Group 3 America Attacked". This informs users of the end of one group and introduces the content of the next group. In these ways, the page index and delimiters make non-visual Web access easy, and allow users access to all of the content in a target page without hearing irrelevant material.

TRANSCODING FEATURES FOR PHONE ACCESS

Phone-based Web access is expected to be a new Web access channel, not only for blind people, but also for other people who need non-visual web access, such as senior citizens and drivers. Some senior citizens may feel uncomfortable using high-tech devices, and some drivers want Web access, even though they can't look at a computer display while driving. Since the requirements of blind users and sighted users for their non-visual Web access are quite different, we have to consider different user interfaces for each situation. Therefore we developed the following two transcoding modes for phone-based Web access:

1. Basic mode
This mode realizes very simple user interface for non-visual Web access by sighted users, such as senior citizens and drivers. Only three commands are assigned to the keypad, "skip current group" and "go to group list" and "go to link list". Figure 3 shows the access model for this basic mode. Voice navigation commands are also assigned for these three commands.

2. Voice browser simulation mode
The number pad operation of a voice browser, Homepage Reader is simulated, and more than 20 commands are assigned to the telephone keypad. Voice commands are also assigned for these commands.

Figure 3 Access Model of VoiceXML Transcoding (basic mode)

These two modes can be selected according to each user's situation. The following example is using the basic mode.

When a user opens the Washington Post top page, the system reads the group list as follows:

Welcome to washingtonpost.com News Front.
There are 17 groups on this page. Please select a group.
1. Updated time
2. Headline
3. America Attacked
4. Today's photo

The user can select any group by pressing the group number on keypad, saying the number or saying a key word included in the group title, such as "headline" or "photo".

After selecting a group, the system reads the content of the group, and then continues reading the following groups. If the "next group" command is selected, the system will stop reading the current group and jump to the next group. If the "link list " command is selected, system will read a selection menu of the links included in the current group, such as:

There are 7 links. Please select a link.
1. Bush Turns to Domestic Agenda
2. Stocks Surge; Dow Rises 4.5%
3. Nimda Virus Closes Fairfax Web Site

As soon as a link is selected, the destination page will be opened. In this way, the system allows users access the Web non-visually through a telephone.

CONCLUSION

We have developed a Web content transcoding system for voice browser and screen reader users. It rearranges the original contents based on visual groupings, inserts delimiters at each boundary between groupings, and prepares an index of the page. This system is also capable of creating pages for phone-based Web access. We have developed a prototype system for VoiceXML transcoding and shown that it can be useful with existing Web pages.

In March 2001, we opened a pilot Web service together with Lycos Japan. Some of the IBM Japan Web site has been linked to this service since September 2001. The system transcodes these pages using our transcoding features. Our next plan is to support phone-based Web access system using sound effects and expand the system not only for blind users, but also for anyone who has difficulties with the standard user interface of the Web.

REFERENCE

[1] IBM Corporation, Home Page Reader, http://www-3.ibm.com/able/hpr.html

[2] Freedom Scientific, Jaws, http://www.hj.com/JAWS/JAWS.html

[3] Asakawa, C., Takagi, H. Annotation-Based Transcoding for Non-visual Web Access, in Proceedings of The Fourth International ACM Conference on Assistive Technologies ASSETS 2000 (Nov 2000), 172-179.

[4] IBM Corporation, WebSphere Transcoding Publisher, http://www-4.ibm.com/software/webservers/transcoding/

[5] IBM Corporation, WebSphere Voice Server, http://www-4.ibm.com/software/speech/enterprise/ep_1.html


Go to previous article 
Go to next article 
Return to 2002 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.