2000 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2000 Table of Contents


An Alternative Method for Building A Database for American Sign Language

Noriko Tomuro, Karen Alkoby, Andre Berthiaume, Pattaraporn Chomwong, Mary Jo Davidson, Jacob Furst, Brian Konie, Glenn Lancaster, Steven Lytinen, John McDonald, Lopa Roychoudhuri, Jorge Toro, Rosalee Wolfe

School of Computer Science, Telecommunications and Information Systems
DePaul University
Chicago, IL
Email: asl@cs.depaul.edu

Research shows that English literacy of deaf people in the United States is much lower than that of hearing people. Most deaf adults read English at the third or fourth grade level. That is because English and American Sign Language (ASL), the native language used by the Deaf community in the US, have very different syntactic structures. Because of the lack of reading fluency, English-based assistive technology such as closed captioning on television offers some help in making the hearing world accessible to deaf people, but can only serve as a partial aid. In order to fully bridge the gulf between the deaf and hearing worlds, alternative assistive technologies based on sign language are needed.

1. Introduction

American Sign Language (ASL) is a natural language used by members of the North American Deaf community and is the third or fourth most widely used language in the United States [Ster96]. While ASL shares some vocabulary with English, research in linguistics shows that it has a concise and elegant syntax which differs radically from English grammar [Bake80][Vall93]. This makes the acquisition of English rather challenging for ASL signers. Research shows that most native ASL signers read English at the third or fourth grade level [Holt94]. Therefore, English-based assistive technologies such as closed captioning on television do not provide a sufficient solution [Wolf99]. At present deaf people rely on sign language interpreters for access to spoken English. However, they can not depend on interpreters every day in life mainly due to the high accommodation cost and the difficulty in finding and scheduling qualified interpreters.

A better solution would be to use a personal digital English-to-ASL translator. An English-to-ASL translator would convert written or spoken English into a three-dimensional graphic animation depicting ASL. It would be an economical as well as flexible solution. Since personal computers have become lighter and are available for less than $1000, deaf people can bring a laptop whenever and wherever they go. Such a translator would also convert full English sentences to ASL. Thus, it could convey complex ideas expressed in English to ASL and is much more useful and flexible than sign dictionaries or phrase books.

Not only would this ASL synthesis technology assist deaf people, it can also be used as a valuable tool for educators and researchers. By using three-dimensional graphics, signs can be viewed from any position, from the signer's standpoint, or from the observer's stand point. It is even possible to view a signer from the side to see how far forward a sign extends. Also, animation can show the timing of signs visually. Thus, this synthesis technology offers the potential for a rich and flexible environment for ASL education and research.

However, developing this kind of English-to-ASL translation system imposes serious technical challenges. Because of the unique modality of ASL as a sign language -- visual/gestural rather than aural/oral -- the translation system must store the linguistic and geometric aspects of ASL signs and generate graphic animations on screen in real time. Also, a sequence of signs as an ASL sentence must look smooth and natural. Therefore, building such an English-to-ASL translator requires expertise in a wide range of areas, including linguistics, machine translation, computer graphics, mathematics and kinesiology.

We are currently in the process of developing a database of ASL signs that will be used as the lexical database in our translator. The database scheme draws on the experiences of Dutch [Cras98], German [Pril89], and Japanese [Lu97] researchers who are working on similar projects for other sign languages. It includes such items as position, orientation and shape of the hands as well as motions.

The largest task in creating a database of this type is data entry, which involves transcribing ASL. To transcribe sign language, several researchers used motion capture, which recorded the motion and position of hands through gloves with sensors [Eren96][Fels98]. However, despite a considerable amount of financial investment required for equipment, animations produced from the recorded data are often inaccurate [Zord99]. Motion capture also has a critical disadvantage in that recorded numerical data is hard to modify and abstract to symbolic level into linguistic features of ASL signs, such as fingers being "hooked" or "bent".

Our approach is to use an animation software package and customize it for fast sign transcription. Customizing a general animation package has a great advantage in speeding transcription time. Normally, learning a general package requires a significant time investment. Our students reported that working through the tutorials of a commonly used animation package took between 40 and 100 hours. Few volunteers from the deaf community are willing to invest such a large amount of time in training before beginning the transcription process. By eliminating features in the package that are irrelevant to ASL and facilitating an interface that is intuitive and ASL-specific, minimal learning is required for transcription.

2 Transcription System

Our transcription system has a bi-level structure. The lower level, the hand transcriber, is used to build the handshape data. The upper level, the sign transcriber, relies on the handshape database and allows users to specify the location and motion of the two (left and right) hands.

Both transcribers utilize familiar controls such as checkboxes, selection lists and slider bars. Most labels are symbolic, linguistic features of ASL signs, and the underlying mathematical information is completely hidden from users. The interface is what-you-see-is-what-you get (WYSIWYG), which allows users to see the signs graphically as they enter data. To verify that a sign appears correct from all angles, users can "walk around" any handshape or sign they create.

2.1 The Hand Transcriber

The hand transcriber allows users to specify handshapes. Users can select one or more fingers at a time and move them to a desired position. Slider bars specify the configuration of the selected fingers, for instance somewhere between "together" and "spread", or between "flat" and "hooked". Figure 1 shows an example screen capture of the hand transcriber.

Figure 1: The Hand Transcriber
Image of figure 1. The hand transcriber allows users to specify handshapes. Users can select one or more fingers at a time and move them to a desired position. Slider bars specify the configuration of the selected fingers, for instance somewhere between

2.2 The Sign Transcriber

The sign transcriber is built on top of the handshape database. It allows users to specify ASL signs in terms of handshape, location and orientation for both hands. Motions are entered by specifying this information for various time steps. Then the sign transcriber generates a sign as an animation, with the configuration of step1 as the start key frame and the configuration of the last step as the end key frame.

Figure 2 shows a screen capture of the sign transcriber. To enter the configuration for a particular time step, users first specify either left or right hand and select a handshape from the database. Figure 3 shows the pop-up window of handshapes. Then users specify palm orientation for the hand by selecting from a list box ("up", "down" etc.). Next comes the specification for the hand location. A location is defined by two parameters: vertical height and horizontal space. Height is specified by raising a bar in the side view of a human body ("business woman"), and the horizontal space is specified by selecting a button at the position in the top view of a human body. After the user repeats this process for the other hand, specification for a time step is complete. At this point, hand information is displayed graphically as part of a full body ("bubbleman") in the Perspective window. The user continues to enter specification for more time steps. Then by selecting the Sign menu from the top menu-bar, an animation is displayed in the Perspective window.

Figure 2: The Sign Transcriber
Image of figure 2 shows a screen capture of the sign transcriber. To enter the configuration for a particular time step, users first specify either left or right hand and select a handshape from the database.

Figure 3: Hand Selection in the Sign Transcriber
Image of figure 3 shows the pop-up window of handshapes. Then users specify palm orientation for the hand by selecting from a list box ("up", "down" etc.). Next comes the specification for the hand location. A location is defined by two parameters: vertical height and horizontal space. Height is specified by raising a bar in the side view of a human body ("business woman"), and the horizontal space is specified by selecting a button at the position in the top view of a human body. After the user repeats this process for the other hand, specification for a time step is complete. At this point, hand information is displayed graphically as part of a full body ("bubbleman") in the Perspective window. The user continues to enter specification for more time steps. Then by selecting the Sign menu from the top menu-bar, an animation is displayed in the Perspective window

 

3. Future Work

Our initial usability studies have been quite promising. On average, it took 10 minutes for native ASL signers to learn enough about the handshape transcriber to create handshapes. Transcribing a handshape took an average of 82 seconds. In order to make the transcription system truly useful, more in-depth usability testing by ASL signers of various computer literacy levels is needed.

Another important extension is to incorporate non-manual features of ASL, namely facial expression and body motion, in the database. In ASL, non-manual signals are just as critical as hands in defining signs. A transcription system which incorporates all these aspects of ASL could be used as a tool for teaching ASL, computer-aided tutoring in ASL and many other tasks.

References


Go to previous article 
Go to next article 
Return to 2000 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.