2001 Conference Proceedings
Go to previous article
Go to next article
Return to 2001 Table of Contents
CARPI: A DATABASE FOR COMPUTER GENERATED FINGERSPELLING
Clarke Steinback, Ph.D.
Computer Science Department
California State University, Chico
Chico, California 95929
ranger@ecst.csuchico.edu
INTRODUCTION
A language such as American Sign Language (ASL), is a complicated
visual system involving hands, arms, torso, head and facial
expressions with static and dynamic aspects to the signs. ASL is
a natural language used throughout the United States; it is the
third or fourth most widely used language in the US (Sternberg,
1996). Translation from voice to English text is frequently the
foundation of assistive technology. As Wolf (Wolf 1999)
indicates, the voice to English text translation used in closed
captioning on television as an assistive technology is not
effective. Indeed, as Holt (Holt 1994) points out, most ASL
signers read English text at a third or fourth grade level. The
production of signed English can be seen as an intermediary
stage, with true ASL translation the ultimate stage. In any
translation system from voice to sign, a database of signs is
necessary for producing either ASL or signed English. In creating
such a database, storing the information to reproduce the signs
is complex and requires careful and rigorous planning.
In creating an appropriate database for development of a
translation system it is possible to first consider
fingerspelling as a subset of ASL, acknowledging that this phase
is prototypic with limited direct application. Although many who
are deaf or hearing impaired read text fluently, for those who
learned ASL in their youth, ASL is their mother tongue.
Fingerspelling is thus an acceptable starting point from which to
test the qualities essential to developing an animation-based
translation system and its underlying database. Focusing just on
fingerspelling for use in assistive technology simplifies the
problem to that of correct gestural movements of a single hand.
Although the individual fingerspelling signs consisting of hand
positioning and motion components are not in themselves
excessively complicated, two key aspects make computer animated
fingerspelling successful for assistive technology -- realistic
human hand rendering and realistic animation of transitions
between letter pairs.
Letters occur in streams, not isolated elements; animation of
fingerspelling needs to display transitions effectively. Viewers
notice incorrectly animated human figures. Any animation of
fingerspelling should have realistic signs and transitions
between signs. Realistic animation of human hands for assistive
technology is more complicated than just placing a rendered hand
at the end of an animated arm. This study has achieved this
realistic animation by capturing and using transitions
information.
BACKGROUND AND RELATED WORK
In the development of assistive technology to aid in translating
text to sign language, two major avenues have already been
proposed. One using video clip technology and the other using
computer generated animation. Video clip technology (Haas 1992)
provides for continuous sign language display and is often used
in educational and training programs. However, these systems
cannot function as translation devices beyond the pre-recorded
phrases, and they require enormous storage capacity to
accommodate sufficient video clip phrases to allow for
translation functions. Once we reach beyond pre-recorded phrases,
the natural inbetweening inherent in the original video clips is
lost, and recreating it is difficult. nearly insurmountable. The
second approach uses computer generated graphics to allow for
assembling new phrases out of the words/letters that the system
knows how to generate (Magnenat-Thalmann 1990). In such systems,
the computer renders the signs as needed for the translation,
thus providing flexibility necessary to translation. However,
realistic rendering and motion of the animation can consume
considerable processor time or require expensive, non-portable
computer systems. An assistive translation system not only must
provide the user with understandable images, but must also be
portable and real-time.
To address the issues of the computer generated signing, several
researchers have focused on fingerspelling as a subset of the
overall signing issue (Holden 1992, Lee 1992, Lee 1993, Steinback
1997, Steinback 1999). Holden and Roy (Holden 1992) developed a
system to translate English text to fingerspelling as a sign
language training tool using 2D animation. Lee, et al.. (Lee
1992, Lee 1993) provide a notation system to produce hand
animation for synthetic actors and sign language translation with
3D rendered hand animation. Su and Furuta (Su 1998) created a
web-based educational system using a hand posing system.
Steinback (Steinback 1997, Steinback 1999) produced a system to
translate English text to fingerspelling that provides 3D
animation. Unlike previous endeavors this is on a personal
computer, inbetween positions are calculated by linear
interpolation as well as captured via the CyberGlove, and offers
varying levels of animation realism.
The animated fingerspelling and sign language systems developed
thus far use linear interpolation and keyframing to produce the
motion between the individual signs. Steinback (Steinback 1997,
Steinback 1999) demonstrated that sign pairs can be considered
repetitive motion for which controllers can be developed much
like Hodgins (Hodgins 1995) work with human athletic motion in
animation. The use of glove input can provide the opportunity to
capture and store the transitional information. Other than
Steinback (Steinback 1997, Steinback 1999), none of these
animated systems addressed what is appropriate for the user in
terms of rendering realism and motion realism, nor have most
considered usage of portable systems for assistive
technology.
This study sets the tone for how investigations into the
graphical elements necessary for a translation system can be
conducted. Developing animation sufficient for fingerspelling
enables future research to extrapolate the fingerspelling results
necessary to creating signed English translation. Once the
natural language problem is solved, a true ASL translation device
can be produced because of these findings on graphical
elements.
COMPUTER ANIMATION REPRESENTATIONAL POSITION INDEX
(CARPI)
Following are the highlights of the creation of this computerized
fingerspelling database for assistive technology -- the Computer
Animation Representational Position Index (CARPI). Because only a
fixed number of letters exist in English text (and a fixed number
of combinations of the letters), ASL fingerspelling can be viewed
as repetitive motion directed by a script (the text). Thus, a
hand model for recording the position of the hand and fingers
using a CyberGlove was used. A library of poses and intermediate
poses (much like a controller) was created. A display program was
developed to read the text in letter pairs and use the library of
intermediate positions to produce corresponding animation. This
approach allows the processor of the system to be used for
animation rather than the calculation of constraints. The
objectives consisted of the development of methods: to capture
sign pair geometry information and to store this information for
replay; edit sign pair information and select archetypal sign
images; and transition between ending letter signs and beginning
letter signs. A hand model was developed and used throughout the
capture, edit, and transition segments of the research.
HAND MODEL
This study utilized an object-oriented model of a hand having 19
bones and 15 joints providing 24 degrees of freedom (DOF),
closely following Rijpkema's hand model (Rijpkema 1991). The
basic motions and configurations of the human hand can be modeled
based on these twenty-four degrees of freedom. To obtain the
appropriate proportions and fixed rotation components for the
bones, physical measurements of actual human hands and
measurements obtained from hand models in Wilson and Wilson
(Wilson 1978) were used.
CAPTURE
In the capture segment, an object-oriented data acquisition
environment was developed to capture hand geometry of letter
signs and store the geometry data in the CARPI. This data
acquisition environment was designed to develop a method to
capture sign pair geometry information and store this information
for later replay.
The data acquisition environment uses a CyberGlove and an
Ascension positional tracker to obtain hand geometry for a letter
sign and the transition information between sign pairs at 34
measurements per second. It then stores the data in the CARPI.
For the CAPRI to contain all possible letter sign combinations,
the data acquisition environment was used to capture the hand
geometry of each letter sign (26 items), and the transitional
geometry for each letter combination (26 x 26 item pairs -- 676).
To represent each letter sign consistently throughout, an expert
signer identified archetypal hand poses from the captured
signs.
The capture segment provided the means for capturing and storing
the geometries necessary for producing the individual sign images
and the transitions between each letter pair. This allowed for
automatic capture of the geometry information with required
calibration before each sign pair capture.
EDIT / TRANSITION
The editing segment of the study was designed to allow replay and
modification of the information stored in the CARPI. This segment
addressed the second and third objectives of the study:
development of a method to edit sign pair information and select
archetypal sign images, and development of a method to transition
between ending letter signs and beginning letter signs,
respectively. The purpose of the modifications made with the
editing environment was to create transitions between pairs such
that there was no discernible discontinuity for the user between
ending and beginning letter signs.
In the transition segment the editing program was used to select
the level of linear interpolation of the archetype and the actual
letter pair for both the beginning of the pair and the end of the
pair. As a consequence, the beginning and ending geometries of
each pair matched the selected archetype geometry, enabling the
actual captured geometries to be blended either from that
geometry or to that geometry, respectively.
RESULTS
The successful creation of the CARPI is reflected in the fact
that the computer generated fingerspelled words were understood
by those fluent in ASL The acquisition of the data using the
glove was far less time-consuming than manual measurements. The
entire data collection with the glove for all 26 letters and all
676 letter pairs took less than a total of eight hours, an
average of less than one minute to produce each functional sign
pair.
The editing environment successfully allowed selection of
archetypal letter signs and modification of sign pairs. This
environment containing a computer generated fingerspelling
display system allowed testing of sign pairs and words in order
to successfully select a linear interpolation for the smoothing
of transitions between sign pairs. The editing environment
provided a relatively simple method for transitions between
ending letter signs and beginning letter signs. Because choosing
an archetypal sign image that best displays the characteristics
of the sign requires many subjective decisions, this method can
not be completely automated. The editing environment produced
data stored in the CARPI that was successfully used to produce
animation with no discernible discontinuity for the user.
CONCLUSION
First and foremost, the system used in this study developed a
capture method that provides the means to readily capture and
store all the sign pair information in the CARPI. Perhaps more
important, however, the system can also replay the information
stored in the CARPI, thus providing a method for producing sign
pair replay that blends the ending and beginning letters of sign
pairs so that subjective observers perceive no discernible
discontinuity. The CARPI produced by this study could, therefore,
be used reliably in several phases of the progression toward the
next stage: computer generated signed English.
This study has demonstrated that the capture of sign geometries
and their storage in the CARPI as sign pair controllers creates a
means for producing more realistic motion of the sign pairs than
does linear interpolation between the sign pairs. Thus, the CARPI
developed in this study underscores the validity of the concept
of using controllers for more realistic animation of
fingerspelling sign pairs than the linear interpolated
method.
Overall, the system showed that recognizable fingerspelling
images can be produced from text input using inexpensive laptop
computers. The glove capture system allowed for easier and less
time consuming capture of the hand geometries than manual
measurements. The display program linked to the CARPI database on
a laptop computer produced recognizable fingerspelling images.
The effective and efficient graphical representation of the signs
and their transitions sets the stage for the progression from
understandable fingerspelling to sign language translation.
FUTURE WORK
Further work is needed to determine the level of hand image
quality and the level of motion realism of the successive hand
images sufficient for both the understanding by the user and the
timely display of text-to-fingerspelling. Also, user interface
issues such as the rate of the hand image presentation rate, hand
image size, and background images needed to accommodate
understanding by different users should be addressed.
REFERENCES
- Haas, C., and S.X. Wei, "Stanford American Sign Language
Videodisc Project." Proceedings of the Johns Hopkins National
Search for Computing
- Applications to Assist Persons with Disabilities (1-5 Feb.
1992). Los Alamitos, CA, USA: IEEE Computer Society Press, 1992.
p. 41-44.
- Hodgins, J. "Animating Human Athletes." SIGGRAPH Conference
Proceedings, 29(4):71-78, August 1995.
- Holden, E.J. and G.G. Roy. "The Graphical Translation of
English Text into Signed English in the Hand Sign Translator
System." Eurographics. 11(3):C357-C366. 1992.
- Holt, J., Demographic, Stanford Achievement Test 8th Edition
for Deaf and Hard of Hearing Students: Reading Comprehension
Subgroup Results.
1994.http://www.gallaudet.edu/~cadsweb/sat-read.html
- Lee, J. and T.L. Kunii. "Hand Motion Coding System for
Algorithm Recognition and Generation." Proceedings of Computer
Animations '92. Editors: N. Magnenat-Thalmann and D. Thalmann,
Springer-Verlag, Tokyo, Japan, 1992.
- Lee, J. and T.L. Kunii. "Models and Techniques in Computer
Animations." Proceedings of Computer Animations '93.
Springer-Verlag, Editors: N. Magnenat-Thalmann and D. Thalmann,
Tokyo, Japan, 1993.
- Magnenat-Thalmann, N. and D. Thalmann. Synthetic Actors in
Computer-Generated 3D films. Springer-Verlag, New York,
1990.
- Rijpkema, H., and M. Girard. "Computer Animation of
Knowledge-Based Human Grasping." SIGGRAPH Conference Proceedings,
25(4):339-348, July 1991.
- Steinback, C., Lodha, S., "Computer Animated Fingerspelling
for Assistive Technology" 12th Annual International Technology
and Persons with Disabilities Conference. Los Angeles, March
1997.
- Steinback, C., Computer Generated Fingerspelling for
Assistive Technology, Dissertation Abstracts International-B
60/03, p. 1177, September 1999.
- Sternberg, M., The American Sign Language Dictionary,
Multicom, 1996. (CD ROM)
- Su, S A.., R. K. Furuta, "VRML-based Representations of ASL
Fingerspelling on the World-Wide Web." ASSETS '98: The Second
International ACM Conference on Assistive Technologies. Marina
del Rey, CA: ACM, 1998. pp. 43-45.
- Wilson, D.B. and W.J. Wilson. Human Anatomy, Oxford
University Press, New York, 1978.
- Wolfe, R. et al., " Interface for Transcribing American Sign
Language." In Proceedings of the 26th International Conference on
Computer Graphics and Interactive Techniques, Los Angeles
1999.
Go to previous article
Go to next article
Return to 2001 Table of Contents
Return to Table of
Proceedings
Reprinted with author(s) permission. Author(s) retain copyright.