2001 Conference Proceedings
Go to previous article
Go to next article
Return to 2001 Table of Contents
Transcoding System for the Non-Visual Web Access (2) --
Annotation-based Transcoding --
Chieko Asakawa
chie@jp.ibm.com
Hironobu Takagi
takagih@jp.ibm.com
IBM Japan Ltd.
Tokyo Research Laboratory
1623-14, Shimotsuruma
Yamato-shi, Kanagawa-ken 242-8502
Japan
Introduction
These days, the role of the Web has been becoming wider, and Web
authors tend to present as much information as possible in one
page. A news site, for example, contains not only articles, but
also shopping lists, hobby-related information, advertisements
and so on. This information is visually fragmented into
groupings, using various types of visual effects such as
different background colors, different fonts, layout tables,
spacing and so on. The blind users read the Web contents in tag
order, but visually fragmented groupings are not accessible using
tag order reading, so this authoring trend has been making the
non-visual Web access harder.
Therefore, we decided to develop a transcoding system to improve
non-visual Web access, which works as an intermediary between a
server and a user. Our system consists of two parts, one for
automatic transcoding and one for annotation-based transcoding.
Both methods have pros and cons. Automatic transcoding can
simplify a Web page without any manually produced annotations,
however, the use is sometimes limited since it cannot deal with
visually fragmented groupings nor with the distinct roles of the
groups. Annotation-based transcoding can create an accessible
page for voice output without removing any content, but it
requires external annotations. Therefore we use both methods to
improve the non-visual Web access. In this paper, we focus on the
annotation-based transcoding. The most important objective is to
transcode existing Web pages which are presented as two
dimensional information to make the pages accessible as one
dimensional information. After introducing the system
architecture, we will describe our proposed annotations which can
be provided both by sighted and blind annotators. We will then
show examples of transcoded pages with the annotations and
discuss conclusions and plans.
Annotation-based transcoding
Architecture Figure 1 -- Architecture of the Annotation-based
Transcoder
Figure 1 shows the system architecture. The proxy server is the
main component that transcodes a target HTML document. Users
access our system simply by setting a proxy server for their
browser. We use the IBM WebSphere Transcoding Publisher (WTP) as
a proxy server and our transcoding system is implemented as a
plug-in for WTP. It consists of three main components, a
transcoding module, an annotation manager and an annotation
database. When the transcoding module receives a target HTML
document, the annotation manager also searches the annotation
database with the same URL, and WTP transcodes the target HTML
document using the annotation file returned from the annotation
database. The annotation server (not pictured) receives
annotation files which are created by sighted annotators and
registers them into the annotation database. Each annotation file
is basically linked to one URL. However, one page is sometimes
similar to others, and in such cases, the annotation files can be
shared among those pages. This is possible because our system
evaluates the similarity of HTML documents.
Visually-specified annotations
These have two components, one for structural annotations and one
for commentary annotations.
Structural annotations
The system uses the structural annotations to recognize visually
fragmented groupings as well as to show the importance and basic
role of each group. Basically, there are three XML tags in this
kind of annotation, /< member />, /< role /> and
/< importance />. A /< member /> tag consists of one
HTML element or two or more HTML elements which belong to one
group. A /< role /> tag describes a role for each group. A
visually fragmented group generally has a role such as main
content, header or footer of a page, index, advertisement, and so
on. An /< importance /> tag indicates an importance of each
group.
We have been prototyping a WYSIWYG authoring tool for this
purpose. With this tool, sighted annotators can indicate each
grouping using a mouse and a keyboard while looking at the IE
screen. First, a sighted annotator needs to collect the elements
of a visually fragmented group. After all of the elements in one
group are selected, they can be registered to a /< member
/> tag as one group. Next, he or she needs to annotate the
grouping with a role. The system provides the default roles as
selection items, but when none of them fits for the target
group's role, it can be described by annotators. When a role is
selected from the selection items, an /< importance /> tag
is automatically defined. For example, the main content is
assigned as the most important group, while an advertisement has
the lowest importance. The importance can be indicated by
annotators when they describe a role for the group.
Commentary annotations
Commentary annotations are used to give a useful description of
each group. They can be also used to describe HTML tags, such as
tags,
<form></form>
tags, <img />tags, /< area /> tags and so on. In the case
of an <img />tag, currently there is no way to give users any
description of an image on the fly when there is no alternative
attribute (text description) of the image. Our annotation is
described externally, so when an annotator thinks it is important
to annotate an image with an explanation, it can be done
easily.
In the case of a tag, it might be annotated as "With this form, you can search for
titles of the video library," or "For this input box, you can
only input numbers." These are just examples, but actual
commentary annotations are written freely. Any comment to improve
the non-visual Web access is helpful and appreciated.
Basically, there are two XML tags for commentary annotation,
/< member /> and /< comment />. A /< member />
tag consists of one HTML element or two or more HTML elements
which form one grouping. A <comment></comment> tag provides a
comment for a /< member /> tag. With the authoring tool,
annotators first select a group for a commentary annotation the
same way they visually select fragmented elements for structural
annotation. Then they describe the group in a comment.
Both structural and commentary annotations are saved as external
XML documents and registered into the annotation database.
User annotationUser annotations consist of two kinds of
annotations, one for selecting the main content and one for
selecting the most useful form. Any page that is shown via our
transcoding proxy has "Settings" as a link at the bottom of the
page. When a user selects this link, the two links for "user
annotation", appear in the settings menu.
An annotation authoring tool for blind users differs from one
used by sighted annotators. We have been prototyping it as a form
of Web application. This allows blind users to use it in an
integrated way, as an extension to their regular surfing of the
Net.
Selecting main content
When a user selects the "main content" command, the system will
insert links with commands to the proxy at certain candidate
elements on the screen. When a user selects one of these links,
it registers the position as the starting point of the main
content in that page. The system has heuristically analyzed that
these locations might be the main content based on our
experience. For example, a string which has more than 40
characters without any link, or a string which starts under a
horizontal separation line might be the start of the main
content. After registering the starting point of the main content
in a page, it can be used in two ways. One use is as the target
for an image link with "skip to main content" in its Alt
attribute. The other usage is for moving information that was
originally above the main content to the bottom of the page. In
this way, a user can find the main content of the page very
easily.
Selecting the most useful form
When the "the most useful form" option is selected, a link
appears before the beginning of each