2001 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2001 Table of Contents


CARPI: A DATABASE FOR COMPUTER GENERATED FINGERSPELLING

Clarke Steinback, Ph.D.
Computer Science Department
California State University, Chico
Chico, California 95929
ranger@ecst.csuchico.edu

INTRODUCTION

A language such as American Sign Language (ASL), is a complicated visual system involving hands, arms, torso, head and facial expressions with static and dynamic aspects to the signs. ASL is a natural language used throughout the United States; it is the third or fourth most widely used language in the US (Sternberg, 1996). Translation from voice to English text is frequently the foundation of assistive technology. As Wolf (Wolf 1999) indicates, the voice to English text translation used in closed captioning on television as an assistive technology is not effective. Indeed, as Holt (Holt 1994) points out, most ASL signers read English text at a third or fourth grade level. The production of signed English can be seen as an intermediary stage, with true ASL translation the ultimate stage. In any translation system from voice to sign, a database of signs is necessary for producing either ASL or signed English. In creating such a database, storing the information to reproduce the signs is complex and requires careful and rigorous planning.

In creating an appropriate database for development of a translation system it is possible to first consider fingerspelling as a subset of ASL, acknowledging that this phase is prototypic with limited direct application. Although many who are deaf or hearing impaired read text fluently, for those who learned ASL in their youth, ASL is their mother tongue. Fingerspelling is thus an acceptable starting point from which to test the qualities essential to developing an animation-based translation system and its underlying database. Focusing just on fingerspelling for use in assistive technology simplifies the problem to that of correct gestural movements of a single hand. Although the individual fingerspelling signs consisting of hand positioning and motion components are not in themselves excessively complicated, two key aspects make computer animated fingerspelling successful for assistive technology -- realistic human hand rendering and realistic animation of transitions between letter pairs.

Letters occur in streams, not isolated elements; animation of fingerspelling needs to display transitions effectively. Viewers notice incorrectly animated human figures. Any animation of fingerspelling should have realistic signs and transitions between signs. Realistic animation of human hands for assistive technology is more complicated than just placing a rendered hand at the end of an animated arm. This study has achieved this realistic animation by capturing and using transitions information.

BACKGROUND AND RELATED WORK

In the development of assistive technology to aid in translating text to sign language, two major avenues have already been proposed. One using video clip technology and the other using computer generated animation. Video clip technology (Haas 1992) provides for continuous sign language display and is often used in educational and training programs. However, these systems cannot function as translation devices beyond the pre-recorded phrases, and they require enormous storage capacity to accommodate sufficient video clip phrases to allow for translation functions. Once we reach beyond pre-recorded phrases, the natural inbetweening inherent in the original video clips is lost, and recreating it is difficult. nearly insurmountable. The second approach uses computer generated graphics to allow for assembling new phrases out of the words/letters that the system knows how to generate (Magnenat-Thalmann 1990). In such systems, the computer renders the signs as needed for the translation, thus providing flexibility necessary to translation. However, realistic rendering and motion of the animation can consume considerable processor time or require expensive, non-portable computer systems. An assistive translation system not only must provide the user with understandable images, but must also be portable and real-time.

To address the issues of the computer generated signing, several researchers have focused on fingerspelling as a subset of the overall signing issue (Holden 1992, Lee 1992, Lee 1993, Steinback 1997, Steinback 1999). Holden and Roy (Holden 1992) developed a system to translate English text to fingerspelling as a sign language training tool using 2D animation. Lee, et al.. (Lee 1992, Lee 1993) provide a notation system to produce hand animation for synthetic actors and sign language translation with 3D rendered hand animation. Su and Furuta (Su 1998) created a web-based educational system using a hand posing system. Steinback (Steinback 1997, Steinback 1999) produced a system to translate English text to fingerspelling that provides 3D animation. Unlike previous endeavors this is on a personal computer, inbetween positions are calculated by linear interpolation as well as captured via the CyberGlove, and offers varying levels of animation realism.

The animated fingerspelling and sign language systems developed thus far use linear interpolation and keyframing to produce the motion between the individual signs. Steinback (Steinback 1997, Steinback 1999) demonstrated that sign pairs can be considered repetitive motion for which controllers can be developed much like Hodgins (Hodgins 1995) work with human athletic motion in animation. The use of glove input can provide the opportunity to capture and store the transitional information. Other than Steinback (Steinback 1997, Steinback 1999), none of these animated systems addressed what is appropriate for the user in terms of rendering realism and motion realism, nor have most considered usage of portable systems for assistive technology.

This study sets the tone for how investigations into the graphical elements necessary for a translation system can be conducted. Developing animation sufficient for fingerspelling enables future research to extrapolate the fingerspelling results necessary to creating signed English translation. Once the natural language problem is solved, a true ASL translation device can be produced because of these findings on graphical elements.

COMPUTER ANIMATION REPRESENTATIONAL POSITION INDEX (CARPI)

Following are the highlights of the creation of this computerized fingerspelling database for assistive technology -- the Computer Animation Representational Position Index (CARPI). Because only a fixed number of letters exist in English text (and a fixed number of combinations of the letters), ASL fingerspelling can be viewed as repetitive motion directed by a script (the text). Thus, a hand model for recording the position of the hand and fingers using a CyberGlove was used. A library of poses and intermediate poses (much like a controller) was created. A display program was developed to read the text in letter pairs and use the library of intermediate positions to produce corresponding animation. This approach allows the processor of the system to be used for animation rather than the calculation of constraints. The objectives consisted of the development of methods: to capture sign pair geometry information and to store this information for replay; edit sign pair information and select archetypal sign images; and transition between ending letter signs and beginning letter signs. A hand model was developed and used throughout the capture, edit, and transition segments of the research.

HAND MODEL

This study utilized an object-oriented model of a hand having 19 bones and 15 joints providing 24 degrees of freedom (DOF), closely following Rijpkema's hand model (Rijpkema 1991). The basic motions and configurations of the human hand can be modeled based on these twenty-four degrees of freedom. To obtain the appropriate proportions and fixed rotation components for the bones, physical measurements of actual human hands and measurements obtained from hand models in Wilson and Wilson (Wilson 1978) were used.

CAPTURE

In the capture segment, an object-oriented data acquisition environment was developed to capture hand geometry of letter signs and store the geometry data in the CARPI. This data acquisition environment was designed to develop a method to capture sign pair geometry information and store this information for later replay.

The data acquisition environment uses a CyberGlove and an Ascension positional tracker to obtain hand geometry for a letter sign and the transition information between sign pairs at 34 measurements per second. It then stores the data in the CARPI. For the CAPRI to contain all possible letter sign combinations, the data acquisition environment was used to capture the hand geometry of each letter sign (26 items), and the transitional geometry for each letter combination (26 x 26 item pairs -- 676). To represent each letter sign consistently throughout, an expert signer identified archetypal hand poses from the captured signs.

The capture segment provided the means for capturing and storing the geometries necessary for producing the individual sign images and the transitions between each letter pair. This allowed for automatic capture of the geometry information with required calibration before each sign pair capture.

EDIT / TRANSITION

The editing segment of the study was designed to allow replay and modification of the information stored in the CARPI. This segment addressed the second and third objectives of the study: development of a method to edit sign pair information and select archetypal sign images, and development of a method to transition between ending letter signs and beginning letter signs, respectively. The purpose of the modifications made with the editing environment was to create transitions between pairs such that there was no discernible discontinuity for the user between ending and beginning letter signs.

In the transition segment the editing program was used to select the level of linear interpolation of the archetype and the actual letter pair for both the beginning of the pair and the end of the pair. As a consequence, the beginning and ending geometries of each pair matched the selected archetype geometry, enabling the actual captured geometries to be blended either from that geometry or to that geometry, respectively.

RESULTS

The successful creation of the CARPI is reflected in the fact that the computer generated fingerspelled words were understood by those fluent in ASL The acquisition of the data using the glove was far less time-consuming than manual measurements. The entire data collection with the glove for all 26 letters and all 676 letter pairs took less than a total of eight hours, an average of less than one minute to produce each functional sign pair.

The editing environment successfully allowed selection of archetypal letter signs and modification of sign pairs. This environment containing a computer generated fingerspelling display system allowed testing of sign pairs and words in order to successfully select a linear interpolation for the smoothing of transitions between sign pairs. The editing environment provided a relatively simple method for transitions between ending letter signs and beginning letter signs. Because choosing an archetypal sign image that best displays the characteristics of the sign requires many subjective decisions, this method can not be completely automated. The editing environment produced data stored in the CARPI that was successfully used to produce animation with no discernible discontinuity for the user.

CONCLUSION

First and foremost, the system used in this study developed a capture method that provides the means to readily capture and store all the sign pair information in the CARPI. Perhaps more important, however, the system can also replay the information stored in the CARPI, thus providing a method for producing sign pair replay that blends the ending and beginning letters of sign pairs so that subjective observers perceive no discernible discontinuity. The CARPI produced by this study could, therefore, be used reliably in several phases of the progression toward the next stage: computer generated signed English.

This study has demonstrated that the capture of sign geometries and their storage in the CARPI as sign pair controllers creates a means for producing more realistic motion of the sign pairs than does linear interpolation between the sign pairs. Thus, the CARPI developed in this study underscores the validity of the concept of using controllers for more realistic animation of fingerspelling sign pairs than the linear interpolated method.

Overall, the system showed that recognizable fingerspelling images can be produced from text input using inexpensive laptop computers. The glove capture system allowed for easier and less time consuming capture of the hand geometries than manual measurements. The display program linked to the CARPI database on a laptop computer produced recognizable fingerspelling images. The effective and efficient graphical representation of the signs and their transitions sets the stage for the progression from understandable fingerspelling to sign language translation.

FUTURE WORK

Further work is needed to determine the level of hand image quality and the level of motion realism of the successive hand images sufficient for both the understanding by the user and the timely display of text-to-fingerspelling. Also, user interface issues such as the rate of the hand image presentation rate, hand image size, and background images needed to accommodate understanding by different users should be addressed.

REFERENCES




Go to previous article 
Go to next article 
Return to 2001 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.